[00:00.000 --> 00:10.000]  The next session is a very important one around streaming and Java.
[00:10.000 --> 00:15.000]  Of course, streaming is an increasingly, or has been for years, important popular topic.
[00:15.000 --> 00:17.000]  And Thibautz is going to tell us more about it.
[00:17.000 --> 00:18.000]  Yes.
[00:18.000 --> 00:19.000]  Thank you.
[00:19.000 --> 00:20.000]  You have to talk loudly.
[00:20.000 --> 00:21.000]  Yes.
[00:21.000 --> 00:22.000]  Yeah.
[00:22.000 --> 00:25.000]  So, welcome everyone.
[00:25.000 --> 00:29.000]  So, this session is mainly about three-time stream processing.
[00:29.000 --> 00:33.000]  So, what I'm planning to do today, because it's Sunday and early morning,
[00:33.000 --> 00:35.000]  is to make it as easy as possible.
[00:35.000 --> 00:38.000]  And the fact is, I don't know your background,
[00:38.000 --> 00:42.000]  so I'm not sure how much you know about real-time stream processing.
[00:42.000 --> 00:47.000]  So, I will take it from scratch, basically, to get up to speed everyone.
[00:47.000 --> 00:54.000]  And I will also show you, demo how you can basically use real-time stream processing in your work as well.
[00:54.000 --> 00:58.000]  So, before we start, anyone recognize these guys on the screen here?
[00:59.000 --> 01:01.000]  On the left.
[01:05.000 --> 01:06.000]  Details.
[01:06.000 --> 01:07.000]  Yes, that's correct.
[01:07.000 --> 01:09.000]  So, these are the details.
[01:09.000 --> 01:12.000]  On the right side is the Liverpool Football Cup.
[01:12.000 --> 01:14.000]  That's where I came from.
[01:14.000 --> 01:19.000]  So, I wanted to highlight these two images here, because I wanted to say,
[01:19.000 --> 01:22.000]  you know, real-time stream processing is not about domain specific.
[01:22.000 --> 01:24.000]  So, it could be anywhere.
[01:24.000 --> 01:28.000]  So, it doesn't have to be, like, for example, in financial institutions,
[01:28.000 --> 01:30.000]  or machine learning, or IT or IT.
[01:30.000 --> 01:34.000]  It could be, for example, in sports, or music, or any domain, basically.
[01:34.000 --> 01:38.000]  The fact is, you're using real-time stream processing in every single day.
[01:38.000 --> 01:41.000]  So, just to give you an idea what real-time means,
[01:41.000 --> 01:44.000]  and how you can actually approach it.
[01:44.000 --> 01:47.000]  So, anyone can guess how long it takes for an IT blink?
[01:48.000 --> 01:50.000]  The question is a second.
[01:50.000 --> 01:53.000]  So, yeah, sub-milli-seconds are roughly one-third of a second.
[01:53.000 --> 01:54.000]  So, that's pretty fast.
[01:54.000 --> 01:57.000]  So, the same thing applies if you want to clap hands,
[01:57.000 --> 01:59.000]  or if you want to take a photo as well.
[01:59.000 --> 02:01.000]  So, we're not talking about minutes here.
[02:01.000 --> 02:05.000]  We're not talking about days or weeks, which what batch system is all about.
[02:05.000 --> 02:08.000]  We were talking about, like, sub-milli-seconds,
[02:08.000 --> 02:10.000]  how you can process it in real-time.
[02:10.000 --> 02:12.000]  So, as you can see, it's everywhere.
[02:12.000 --> 02:15.000]  So, it's not domain specific.
[02:15.000 --> 02:19.000]  And some of you who work already with real-time know that, basically,
[02:19.000 --> 02:22.000]  you have some kind of events coming into this moment,
[02:22.000 --> 02:25.000]  and you try to make sense out of it.
[02:25.000 --> 02:28.000]  So, looking at it from user perspective,
[02:28.000 --> 02:32.000]  what you want is to make sure that you have some kind of secret source,
[02:32.000 --> 02:35.000]  or key element, when it comes to real-time stream processing.
[02:35.000 --> 02:37.000]  So, I've seen it so many times.
[02:37.000 --> 02:40.000]  People approach it from the wrong angle.
[02:40.000 --> 02:44.000]  So, they try, basically, to read the data in real-time,
[02:44.000 --> 02:48.000]  and they try, basically, to provide some kind of meaning of this data.
[02:48.000 --> 02:50.000]  So, I'll give you a demo today for logs,
[02:50.000 --> 02:53.000]  so this should be easy to follow.
[02:53.000 --> 02:57.000]  But the secret source here is to kind of combine new data,
[02:57.000 --> 02:59.000]  real-time data with historical data.
[02:59.000 --> 03:03.000]  So, what we mean by real-time data is this data is coming this moment,
[03:03.000 --> 03:05.000]  and you read it in this moment.
[03:05.000 --> 03:08.000]  Obviously, you want to make sense out of it.
[03:08.000 --> 03:10.000]  You want to understand what's going on here.
[03:10.000 --> 03:13.000]  And the historical data is normal data.
[03:13.000 --> 03:16.000]  We know about, like, stored somewhere on physical drive,
[03:16.000 --> 03:18.000]  for example, or database, whatever.
[03:18.000 --> 03:20.000]  So, you want to make sure, basically,
[03:20.000 --> 03:23.000]  to have these two types of data at the same speed.
[03:23.000 --> 03:25.000]  So, now we're talking about two types of data,
[03:25.000 --> 03:28.000]  but how many, you know, is too many, basically?
[03:28.000 --> 03:31.000]  What size we're talking about here?
[03:31.000 --> 03:33.000]  So, for some might be, like, a few thousand,
[03:33.000 --> 03:36.000]  others might be millions, others might be billions.
[03:36.000 --> 03:39.000]  So, essentially, what we're trying to do here
[03:39.000 --> 03:42.000]  is not taking, like, a small data set and trying to process it,
[03:42.000 --> 03:44.000]  because that's, you know, easy to do.
[03:44.000 --> 03:46.000]  But we're talking about, like, over a billion,
[03:46.000 --> 03:50.000]  or over, like, ten billion of seconds in transactions per second.
[03:50.000 --> 03:54.000]  So, the idea here is to take a huge amount of data
[03:54.000 --> 03:57.000]  and, you know, trying to find some kind of trains
[03:57.000 --> 03:59.000]  and alerts from this data.
[03:59.000 --> 04:01.000]  So, I'll give you an example here,
[04:01.000 --> 04:04.000]  so you can start now to work on what's going on here.
[04:04.000 --> 04:07.000]  So, imagine, basically, you write a Java program,
[04:07.000 --> 04:11.000]  and you obviously have some kind of logging mechanism
[04:11.000 --> 04:12.000]  in your application.
[04:12.000 --> 04:16.000]  So, in order to understand what's going on with your log system,
[04:16.000 --> 04:20.000]  essentially, what you want is to have some kind of platform
[04:20.000 --> 04:23.000]  to allow you to actually, you know, analyze it,
[04:23.000 --> 04:25.000]  but also, like, at the same time,
[04:25.000 --> 04:27.000]  we're not talking about logs from yesterday.
[04:27.000 --> 04:29.000]  So, for example, events happened yesterday,
[04:29.000 --> 04:32.000]  you want to make sure that, for example,
[04:32.000 --> 04:35.000]  you actually do alerts or trains in the same moment.
[04:35.000 --> 04:38.000]  So, same thing for trains as well.
[04:38.000 --> 04:41.000]  So, if you want to know if your application
[04:41.000 --> 04:43.000]  is going to crash or not,
[04:43.000 --> 04:47.000]  what you want is to kind of have a platform or solution
[04:47.000 --> 04:51.000]  where it is easy for you to actually look at the data
[04:51.000 --> 04:54.000]  in this moment and say, hey, something is going wrong here.
[04:54.000 --> 04:58.000]  I need to basically define trains out of it
[04:58.000 --> 05:00.000]  and do some alerts.
[05:00.000 --> 05:03.000]  Now, for manual work, this is kind of like painful
[05:03.000 --> 05:06.000]  because you need to go through loops, for example,
[05:06.000 --> 05:09.000]  and you want to make sure that you know how to scale it
[05:09.000 --> 05:11.000]  and also kind of like, you know,
[05:11.000 --> 05:14.000]  knowing exactly where your data is stored
[05:14.000 --> 05:17.000]  because your enemy, when it comes to real-time scene processing,
[05:17.000 --> 05:18.000]  is latency.
[05:18.000 --> 05:22.000]  So, you want to make sure your application is as low
[05:22.000 --> 05:26.000]  as latency when it comes to delay, basically.
[05:26.000 --> 05:28.000]  Obviously, the scaling is bottleneck.
[05:28.000 --> 05:30.000]  And now, if you look at platforms,
[05:30.000 --> 05:33.000]  now, you might have heard of some of these.
[05:33.000 --> 05:36.000]  So, the easiest way is to split these platforms
[05:36.000 --> 05:38.000]  into various categories.
[05:38.000 --> 05:42.000]  So, on this one, here, you can see,
[05:42.000 --> 05:44.000]  you can have open source solutions
[05:44.000 --> 05:46.000]  or you can have hybrid,
[05:46.000 --> 05:50.000]  which is mixed between open source and the managed service.
[05:50.000 --> 05:54.000]  And on the horizontal, as you can see,
[05:54.000 --> 05:56.000]  you need to capture your data.
[05:56.000 --> 05:59.000]  Obviously, in real-time, you need to do some kind of transport
[05:59.000 --> 06:01.000]  as well as some kind of transformation
[06:01.000 --> 06:03.000]  and processing as well.
[06:03.000 --> 06:06.000]  So, you can split it into 12 squares
[06:06.000 --> 06:08.000]  and it becomes like, you know,
[06:08.000 --> 06:11.000]  easier to understand which tool you need to use.
[06:11.000 --> 06:13.000]  But obviously, this is still a bit complex
[06:13.000 --> 06:16.000]  because the area for real-time scene processing
[06:16.000 --> 06:19.000]  is mainly about two different subjects.
[06:19.000 --> 06:22.000]  So, it's not only, you know, capture or transport
[06:22.000 --> 06:25.000]  because you have to do all of these at the same time.
[06:25.000 --> 06:27.000]  It's kind of like, if you want,
[06:27.000 --> 06:29.000]  you need basically to decide if you're going to use
[06:29.000 --> 06:32.000]  swim processing engines from one side
[06:32.000 --> 06:34.000]  or you want to have some kind of fast data storage
[06:34.000 --> 06:36.000]  from the other side.
[06:36.000 --> 06:38.000]  So, swim processing engines are pretty good
[06:38.000 --> 06:41.000]  in handling data coming in real-time,
[06:41.000 --> 06:45.000]  which is like, for example, Kafka or TX equals and so on.
[06:45.000 --> 06:48.000]  Or from the far right side, you can see fast data storage,
[06:48.000 --> 06:51.000]  which is kind of like essentially caching solutions
[06:51.000 --> 06:53.000]  to your application.
[06:53.000 --> 06:55.000]  So, for example, MongoDB, Redis and so on.
[06:55.000 --> 06:57.000]  So, if you want to apply this solution,
[06:57.000 --> 07:00.000]  essentially you need one tool from the left
[07:00.000 --> 07:02.000]  and one tool from the right,
[07:02.000 --> 07:04.000]  which means that it adds more work on your side.
[07:04.000 --> 07:07.000]  So, what you want is kind of looking at it in this way
[07:07.000 --> 07:10.000]  and say, hey, I want one solution for you,
[07:10.000 --> 07:13.000]  and that's where Hazercast comes into place.
[07:13.000 --> 07:15.000]  Obviously, I work for Hazercast and the Hazercast
[07:15.000 --> 07:19.000]  as itself is built on top of the Java virtual machine.
[07:19.000 --> 07:21.000]  So, it's Java based and it's open source.
[07:21.000 --> 07:24.000]  So, this is the platform here.
[07:25.000 --> 07:29.000]  It's kind of like a A to Z solution.
[07:29.000 --> 07:32.000]  So, what you want is to catch your data,
[07:32.000 --> 07:34.000]  capture your data.
[07:34.000 --> 07:36.000]  It could be coming from Apache, for example,
[07:36.000 --> 07:39.000]  Apache Kafka and from IoT devices.
[07:39.000 --> 07:41.000]  It could be coming from some kind of custom connectors
[07:41.000 --> 07:43.000]  because it's open source,
[07:43.000 --> 07:45.000]  which means feel free to contribute to this project
[07:45.000 --> 07:48.000]  or if it comes from file watch up,
[07:48.000 --> 07:50.000]  for example, or from work suffix.
[07:50.000 --> 07:52.000]  So, once you have this data,
[07:52.000 --> 07:55.000]  free time data ingested into the platform,
[07:55.000 --> 07:58.000]  platform itself has two main components.
[07:58.000 --> 08:00.000]  So, the first one is the jet engine.
[08:00.000 --> 08:02.000]  So, this is the engine for scene processing
[08:02.000 --> 08:05.000]  and the demo I will show you how to use it
[08:05.000 --> 08:08.000]  and also the fast data storage or fast data management.
[08:08.000 --> 08:10.000]  So, this is essentially a component
[08:10.000 --> 08:13.000]  which allows you to load your data from external sources
[08:13.000 --> 08:16.000]  and it's optional, obviously.
[08:16.000 --> 08:18.000]  So, it's kind of like, I don't know,
[08:18.000 --> 08:21.000]  some kind of file system or database or stored on the cloud
[08:21.000 --> 08:23.000]  and you load it into memory.
[08:23.000 --> 08:25.000]  Why do you need to load it into memory?
[08:25.000 --> 08:27.000]  Simply because you want to make sure
[08:27.000 --> 08:29.000]  your application is as fast as possible.
[08:29.000 --> 08:31.000]  So, we're talking about, for example,
[08:31.000 --> 08:34.000]  here speed where it's sub milliseconds
[08:34.000 --> 08:35.000]  or fractions of seconds.
[08:35.000 --> 08:37.000]  So, this is very important.
[08:37.000 --> 08:40.000]  For example, in fraud detection scenario,
[08:40.000 --> 08:42.000]  if you, for example, you're using your cards
[08:42.000 --> 08:45.000]  and someone else is using your card somewhere else,
[08:45.000 --> 08:48.000]  you want to get alert in this specific moment.
[08:48.000 --> 08:50.000]  It doesn't make sense to get alert,
[08:50.000 --> 08:53.000]  I don't know, in the afternoon, for example, or next day.
[08:53.000 --> 08:56.000]  So, this type of machine critical solutions,
[08:56.000 --> 08:59.000]  what you want is to make sure that your data is stored
[08:59.000 --> 09:03.000]  in memory which allows you to access it in really good time.
[09:03.000 --> 09:05.000]  So, once you have this data,
[09:05.000 --> 09:07.000]  you can do some kind of transformation for your,
[09:07.000 --> 09:09.000]  for example, on data because remember,
[09:09.000 --> 09:11.000]  we're not talking about one single data source.
[09:11.000 --> 09:14.000]  For scene processing, we're talking about multiple sources.
[09:14.000 --> 09:16.000]  So, it could be, for example, I don't know,
[09:16.000 --> 09:18.000]  some transactions coming in Kafka topic
[09:18.000 --> 09:20.000]  and some IoT device for,
[09:20.000 --> 09:23.000]  weather forecast for examine coming from other topic,
[09:23.000 --> 09:25.000]  sorry, from other source.
[09:25.000 --> 09:28.000]  So, you want to make sure that also you can combine it.
[09:28.000 --> 09:31.000]  Obviously, because here we're in Java run,
[09:31.000 --> 09:34.000]  so you can use the Java, obviously, client for it.
[09:34.000 --> 09:38.000]  So, essentially what you need is kind of like a Java jar.
[09:38.000 --> 09:41.000]  You need to download it and plug it into your pump file.
[09:41.000 --> 09:44.000]  But, for example, if you're a data scientist
[09:44.000 --> 09:46.000]  and you're not sure, you know, about programming language,
[09:46.000 --> 09:48.000]  maybe you use a little bit of Python,
[09:48.000 --> 09:50.000]  but, you know, programming languages is not
[09:50.000 --> 09:52.000]  something you want to invest in.
[09:52.000 --> 09:54.000]  What you can do is do some,
[09:54.000 --> 09:56.000]  everything I mentioned today in SQL.
[09:56.000 --> 09:59.000]  So, which means you do everything for team,
[09:59.000 --> 10:01.000]  written scene processing in terms of alerts,
[10:01.000 --> 10:03.000]  for example, or defining trace,
[10:03.000 --> 10:06.000]  or even query your data using SQL.
[10:06.000 --> 10:08.000]  So, once you do this process,
[10:08.000 --> 10:10.000]  you actually can output it in some kind of,
[10:10.000 --> 10:13.000]  I don't know, same thing for your input.
[10:13.000 --> 10:15.000]  So, the Kafka topic, for example,
[10:15.000 --> 10:17.000]  you can use the WebSocket, or you can create your
[10:17.000 --> 10:20.000]  Java application to do some kind of visualization
[10:20.000 --> 10:22.000]  for predictions, for example.
[10:22.000 --> 10:25.000]  Now, the cool thing about Hazelkast is not only
[10:25.000 --> 10:28.000]  the platform and the easiness of use,
[10:28.000 --> 10:29.000]  but also how to scale.
[10:29.000 --> 10:31.000]  Remember what we're talking about here?
[10:31.000 --> 10:33.000]  We're not talking about a few thousand.
[10:33.000 --> 10:35.000]  We're talking maybe a few million,
[10:35.000 --> 10:37.000]  or even billions of transactions.
[10:37.000 --> 10:39.000]  So, when it comes to scaling,
[10:39.000 --> 10:42.000]  you want to recognize between two different topics.
[10:42.000 --> 10:45.000]  So, for some, scaling might refer to, I don't know,
[10:45.000 --> 10:46.000]  your data.
[10:46.000 --> 10:48.000]  So, you want to scale this data, for example,
[10:48.000 --> 10:50.000]  and for others who work, for example,
[10:50.000 --> 10:52.000]  in programming or development,
[10:52.000 --> 10:54.000]  they focus mainly on the compute.
[10:54.000 --> 10:56.000]  So, you want to make sure, basically,
[10:56.000 --> 11:00.000]  to combine between data and compute when you want to scale.
[11:00.000 --> 11:03.000]  And the cool thing about it is it's partition aware,
[11:03.000 --> 11:07.000]  which means if you have your data stored in multiple places
[11:07.000 --> 11:09.000]  around the world, for example, in different data centers,
[11:09.000 --> 11:12.000]  what you want is your compute or your process
[11:12.000 --> 11:16.000]  or your application is to be stored as close
[11:16.000 --> 11:18.000]  as possible to your data.
[11:18.000 --> 11:21.000]  So, this will give you some kind of speed
[11:21.000 --> 11:25.000]  and lower latency when it comes to transactions.
[11:25.000 --> 11:27.000]  Now, we talked about transactions,
[11:27.000 --> 11:30.000]  but how many we're talking about today?
[11:30.000 --> 11:33.000]  So, we've done this kind of like benchmark, obviously.
[11:33.000 --> 11:35.000]  It's bit outdated now.
[11:35.000 --> 11:39.000]  It's kind of worth trying to add one more zero to it.
[11:39.000 --> 11:42.000]  So, it's one billion transactions per second on 45 nodes.
[11:42.000 --> 11:45.000]  And the cool thing about it is not only the latency,
[11:45.000 --> 11:47.000]  which is like 30 milliseconds,
[11:47.000 --> 11:49.000]  but also the linear scale,
[11:49.000 --> 11:52.000]  which means you just need to add more nodes to your application.
[11:52.000 --> 11:54.000]  So, with that being said,
[11:54.000 --> 11:57.000]  so let's just move directly to the demo
[11:57.000 --> 12:00.000]  and I'll show you how you can use HazardCast
[12:00.000 --> 12:02.000]  within your application.
[12:03.000 --> 12:07.000]  So, for this demo, what I wanted is kind of, you know,
[12:07.000 --> 12:09.000]  you're writing your application,
[12:09.000 --> 12:11.000]  Java application, obviously,
[12:11.000 --> 12:13.000]  and you have some kind of logging mechanism
[12:13.000 --> 12:15.000]  within your application.
[12:15.000 --> 12:17.000]  And your boss comes next day and says,
[12:17.000 --> 12:21.000]  hey, your log messages or your solution,
[12:21.000 --> 12:23.000]  we need to upgrade it in a way
[12:23.000 --> 12:25.000]  where we'll provide some kind of alerts
[12:25.000 --> 12:28.000]  or predict what's going to happen next.
[12:28.000 --> 12:31.000]  So, your task, essentially,
[12:31.000 --> 12:36.000]  what you want is to kind of take exactly the application
[12:36.000 --> 12:39.000]  and make sure you have some kind of scanning mechanism
[12:39.000 --> 12:42.000]  and real-time screen processing into it.
[12:42.000 --> 12:44.000]  So, obviously, you have two options here.
[12:44.000 --> 12:47.000]  So, the first option is to download this jar,
[12:47.000 --> 12:49.000]  plug it into your application and run it.
[12:49.000 --> 12:51.000]  So, that's good.
[12:51.000 --> 12:54.000]  But the problem with this is usually logs stored
[12:54.000 --> 12:56.000]  in different places.
[12:56.000 --> 13:00.000]  So, what you want is you want every machine
[13:00.000 --> 13:03.000]  to send its logs to some, you know, center place.
[13:03.000 --> 13:05.000]  So, usually, it's a cloud.
[13:05.000 --> 13:08.000]  So, the idea of you sending logs to cloud
[13:08.000 --> 13:12.000]  is kind of providing some kind of one place
[13:12.000 --> 13:15.000]  for every single machine which sends logs.
[13:15.000 --> 13:18.000]  HazardCast runs the HazardCast Viridian,
[13:18.000 --> 13:21.000]  which is exactly what we're talking about on the cloud.
[13:21.000 --> 13:24.000]  And this is kind of like what you need to do.
[13:24.000 --> 13:28.000]  So, you write your log message in some way.
[13:28.000 --> 13:32.000]  So, obviously, you need to have context for your message.
[13:32.000 --> 13:37.000]  And the idea is to store all logs into memory.
[13:37.000 --> 13:41.000]  So, we're not going to store it into database or file system
[13:41.000 --> 13:44.000]  because that means you're adding input-output latency
[13:44.000 --> 13:45.000]  to your application.
[13:45.000 --> 13:47.000]  So, we need to minimize this.
[13:47.000 --> 13:50.000]  So, the idea is to use some kind of map structure.
[13:50.000 --> 13:52.000]  And this map has a key,
[13:52.000 --> 13:54.000]  which is like where is this data coming from,
[13:54.000 --> 13:56.000]  from ID address and port number,
[13:56.000 --> 13:58.000]  and the message which is the value.
[13:58.000 --> 13:59.000]  So, it's your choice now.
[13:59.000 --> 14:02.000]  Whether you use Varchar, for example, string.
[14:02.000 --> 14:03.000]  Obviously, it's faster.
[14:03.000 --> 14:06.000]  Or if you want to use some kind of JSON format, for example,
[14:06.000 --> 14:09.000]  if you're trying to do some kind of machine learning
[14:09.000 --> 14:12.000]  and do, I don't know, maybe some classification on your logs.
[14:12.000 --> 14:14.000]  So, once you have your tm value,
[14:14.000 --> 14:17.000]  what you can do is to proceed to save it into HazardCast.
[14:17.000 --> 14:19.000]  So, remember what we're talking about here?
[14:19.000 --> 14:21.000]  So, what you want is to get this map
[14:21.000 --> 14:23.000]  within HazardCast.
[14:23.000 --> 14:27.000]  So, first step is to send your logs into the cloud, obviously.
[14:27.000 --> 14:30.000]  So, because different logs coming from different machines.
[14:30.000 --> 14:33.000]  And second step is to store it into memory.
[14:33.000 --> 14:37.000]  So, this will allow you to access your logs in much faster way.
[14:37.000 --> 14:39.000]  And obviously, you need to do this mapping.
[14:39.000 --> 14:42.000]  So, what you see here is kind of like SQL.
[14:42.000 --> 14:44.000]  You can write it in SQL.
[14:44.000 --> 14:46.000]  And once you have it in SQL,
[14:46.000 --> 14:49.000]  you can proceed to do this instance.
[14:49.000 --> 14:51.000]  So, in order to run HazardCast,
[14:51.000 --> 14:54.000]  what you need is have some kind, I don't know,
[14:54.000 --> 14:58.000]  from HazardCast instance and block in your pipeline.
[14:58.000 --> 15:02.000]  So, pipeline first to the JIT, for example, or your process.
[15:02.000 --> 15:05.000]  And in here, what you say is basically,
[15:05.000 --> 15:08.000]  I'm defining this is the IP address I want to run it on.
[15:08.000 --> 15:11.000]  This is my part and this is my data.
[15:11.000 --> 15:13.000]  Now, I mentioned SQL.
[15:13.000 --> 15:16.000]  So, the platform itself has management center.
[15:17.000 --> 15:22.000]  So, this is really cool, which allows you to query your data.
[15:22.000 --> 15:26.000]  So, what you see here is I'm trying to query my data
[15:26.000 --> 15:29.000]  of my logs and really trying to understand what's going on.
[15:29.000 --> 15:31.000]  So, on the left, you see the key,
[15:31.000 --> 15:33.000]  which is like IP address input.
[15:33.000 --> 15:36.000]  And on the far right side, or bottom right side,
[15:36.000 --> 15:38.000]  you see values.
[15:38.000 --> 15:42.000]  So, essentially what we're doing with logs
[15:42.000 --> 15:45.000]  is we're giving a score for each log message.
[15:45.000 --> 15:47.000]  So, if you're trying to define it, for example,
[15:47.000 --> 15:51.000]  I know, for example, you have a specific category for logs,
[15:51.000 --> 15:54.000]  whether it's information or warning or error.
[15:54.000 --> 15:56.000]  So, that's good, but if you're trying to predict
[15:56.000 --> 15:58.000]  what's going to happen next,
[15:58.000 --> 16:01.000]  you probably need to have some kind of linear scare for it.
[16:01.000 --> 16:03.000]  So, for example, you run it from 100
[16:03.000 --> 16:06.000]  and you give it value for your log message
[16:06.000 --> 16:09.000]  and this is the value you see there.
[16:09.000 --> 16:12.000]  So, we take this data and we ingest it into memory.
[16:12.000 --> 16:14.000]  So, we have the IP address port number
[16:14.000 --> 16:17.000]  and we have a score for each log message.
[16:17.000 --> 16:20.000]  And once we import it, we define a trend.
[16:20.000 --> 16:23.000]  So, in my case, I'm taking a window
[16:23.000 --> 16:25.000]  on this real-time stream processing
[16:25.000 --> 16:29.000]  and, in my case, it's two minutes, but it could be anything.
[16:29.000 --> 16:32.000]  And I'm defining this trend based on the score value
[16:32.000 --> 16:34.000]  for each log message.
[16:34.000 --> 16:37.000]  And based on this, I try to create predictive,
[16:37.000 --> 16:40.000]  or prediction map, so another map,
[16:40.000 --> 16:43.000]  in order to say if I want to send alert,
[16:43.000 --> 16:48.000]  which has value of one or zero, don't send alert.
[16:48.000 --> 16:52.000]  Obviously, you need to do some kind of programming in it.
[16:52.000 --> 16:55.000]  So, the idea is to group messages based on the score
[16:55.000 --> 16:57.000]  to define the trend and from there,
[16:57.000 --> 16:59.000]  you can use some kind of machine learning,
[16:59.000 --> 17:02.000]  so linear regression or classification for example
[17:02.000 --> 17:04.000]  to do some kind of prediction
[17:04.000 --> 17:06.000]  and also you need to output it,
[17:06.000 --> 17:10.000]  which means you send it back to the user.
[17:10.000 --> 17:12.000]  Now, this is good, so this is how it works.
[17:12.000 --> 17:16.000]  So, this is your actual code for doing the map prediction.
[17:16.000 --> 17:19.000]  So, we take the score for each log message,
[17:19.000 --> 17:23.000]  we do the linear regression based on the window I define,
[17:23.000 --> 17:27.000]  and I'm simply checking if the value is greater than 100,
[17:27.000 --> 17:31.000]  send an alert, otherwise, don't send alert.
[17:31.000 --> 17:34.000]  And in here, what you need to focus on
[17:34.000 --> 17:37.000]  is kind of like three ways to proceed from here.
[17:37.000 --> 17:41.000]  So, there are three ways to do stream processing,
[17:41.000 --> 17:44.000]  and the first one is to use SQL for example,
[17:44.000 --> 17:47.000]  so you take it from the logs map
[17:47.000 --> 17:51.000]  and you store it in our log map with some filtering.
[17:51.000 --> 17:55.000]  And second option is to use a process or pipeline,
[17:55.000 --> 17:58.000]  so you read it from the map and you do as well,
[17:58.000 --> 18:01.000]  you do some kind of alerts for example or train.
[18:01.000 --> 18:05.000]  So, the first two ways is called batch stream processing,
[18:05.000 --> 18:07.000]  so it's not read time.
[18:07.000 --> 18:10.000]  And the third option, so this is the main thing you want to do
[18:10.000 --> 18:14.000]  if you want to provide some kind of read time messaging,
[18:14.000 --> 18:17.000]  is to look for kind of the map journal,
[18:17.000 --> 18:20.000]  which is also similar to map,
[18:20.000 --> 18:22.000]  but it's like ring puffer sucker,
[18:22.000 --> 18:25.000]  which allows you to start processing your logs,
[18:25.000 --> 18:28.000]  either from start or from end depends on what you want.
[18:28.000 --> 18:32.000]  And the pipeline itself takes this memory map,
[18:32.000 --> 18:35.000]  logs and do some filtering.
[18:35.000 --> 18:38.000]  So, if you see some value in this specific moment,
[18:38.000 --> 18:41.000]  you can do this alerts for example,
[18:41.000 --> 18:44.000]  or you can send it to the message.
[18:44.000 --> 18:48.000]  So, I think kind of like I wanted to cover everything,
[18:48.000 --> 18:51.000]  but because it's too 20 minutes here
[18:51.000 --> 18:54.000]  and it's not enough time to talk about everything,
[18:54.000 --> 18:57.000]  so just to give you somebody what you need to do
[18:57.000 --> 19:00.000]  and open this stage for questions.
[19:00.000 --> 19:04.000]  So, first of all, you need to store your logs to the cloud.
[19:04.000 --> 19:06.000]  So, in my case, I'm using HazerCast,
[19:06.000 --> 19:09.000]  but obviously you can't use any other cloud provider,
[19:09.000 --> 19:14.000]  but you need to import HazerCast into that cloud provider.
[19:14.000 --> 19:17.000]  So, once you upload all logs,
[19:17.000 --> 19:21.000]  what you want is kind of like have some kind of cloud solution
[19:21.000 --> 19:25.000]  for it, so where you can use for example, I don't know,
[19:26.000 --> 19:29.000]  Varchar, if you want speed or JSON,
[19:29.000 --> 19:33.000]  for example, if you want to apply some kind of machine learning,
[19:33.000 --> 19:36.000]  and you need to use some kind of map structure.
[19:36.000 --> 19:39.000]  So, the idea is to store logs into memory
[19:39.000 --> 19:43.000]  in order to have some kind of random artist and rebalancing.
[19:43.000 --> 19:46.000]  And obviously you need to configure this map
[19:46.000 --> 19:49.000]  because you have a specific size that's on limits on it,
[19:49.000 --> 19:54.000]  so you need to have some kind of eviction policy on your data.
[19:54.000 --> 19:56.000]  And obviously you need to consider security.
[19:56.000 --> 19:59.000]  So, whatever you send to the cloud, obviously it can,
[19:59.000 --> 20:02.000]  you need to have some kind of security mechanism
[20:02.000 --> 20:05.000]  just to make sure that you don't send sensitive data.
[20:05.000 --> 20:09.000]  So, if you're interested in this topic about stream processing,
[20:09.000 --> 20:13.000]  we're running an unconference in March next month,
[20:13.000 --> 20:17.000]  and it's free to join, and there is like training workshop as well.
[20:17.000 --> 20:19.000]  So, all you need to do is just scan this code
[20:19.000 --> 20:22.000]  and register for the real-time stream processing unconference.
[20:22.000 --> 20:24.000]  It's community-based, so it's open source.
[20:24.000 --> 20:27.000]  You get the training, you get batch on top of it,
[20:27.000 --> 20:30.000]  as well as we have run table where you can see
[20:30.000 --> 20:33.000]  industrial experts as well as community users
[20:33.000 --> 20:35.000]  how they can contribute to open source of projects.
[20:35.000 --> 20:39.000]  And, you know, you can basically ask questions if you have.
[20:39.000 --> 20:40.000]  So, with that being said,
[20:40.000 --> 20:43.000]  thanks very much for listening and I'll open for questions.
[20:43.000 --> 20:44.000]  Thank you.
[20:44.000 --> 20:45.000]  Thank you.
[21:14.000 --> 21:40.000]  Yeah, so it is possible to do like multiple streams,
[21:40.000 --> 21:42.000]  joining multiple sources.
[21:43.000 --> 21:46.000]  So, this is possible to do.
[21:46.000 --> 21:49.000]  Obviously, you need just to find the configuration
[21:49.000 --> 21:51.000]  for this specific case.
[21:51.000 --> 21:52.000]  So, I mentioned sources here.
[21:52.000 --> 21:55.000]  So, the sources is not only a single source,
[21:55.000 --> 21:56.000]  it could be multiple sources,
[21:56.000 --> 22:00.000]  and that's where you do join multiple sources.
[22:00.000 --> 22:06.000]  So, it is possible to do it with data.
[22:06.000 --> 22:08.000]  Any other questions?
[22:08.000 --> 22:09.000]  Yeah?
[22:10.000 --> 22:11.000]  Yeah.
[22:11.000 --> 22:13.000]  Please take a seat.
[22:13.000 --> 22:16.000]  What is the difference?
[22:16.000 --> 22:18.000]  So, the question is what is the difference
[22:18.000 --> 22:20.000]  between HazardCast and Apache Flake?
[22:20.000 --> 22:21.000]  So, there was a slide,
[22:21.000 --> 22:23.000]  but I decided to remove it from here.
[22:23.000 --> 22:26.000]  So, essentially what you want is kind of like,
[22:26.000 --> 22:30.000]  for real time is to look for minimizing latency
[22:30.000 --> 22:31.000]  within your application.
[22:31.000 --> 22:35.000]  So, we've done benchmark between HazardCast and Flake.
[22:35.000 --> 22:37.000]  So, this is where things can make difference.
[22:37.000 --> 22:40.000]  So, if you're trying to basically write an application
[22:40.000 --> 22:41.000]  for real-time sequencing,
[22:41.000 --> 22:43.000]  we want to minimize latency.
[22:43.000 --> 22:44.000]  So, the lower is better,
[22:44.000 --> 22:49.000]  and this is where HazardCast has performed a link
[22:49.000 --> 22:50.000]  or Apache Spark.
[22:50.000 --> 22:53.000]  So, the latency is the key difference between these two.
[22:53.000 --> 22:55.000]  The results are online,
[22:55.000 --> 22:58.000]  but I decided not to include it just for this.
[22:58.000 --> 23:02.000]  So, basically it's the latency between different platforms.
[23:02.000 --> 23:03.000]  So, that's what you want to focus,
[23:03.000 --> 23:04.000]  whether it's HazardCast, Flake,
[23:04.000 --> 23:07.000]  or any other platform which offers real-time sequencing.
[23:10.000 --> 23:11.000]  Thank you.
[23:14.000 --> 23:15.000]  Thank you very much, Thomas.
[23:15.000 --> 23:16.000]  Thank you.