[00:00.000 --> 00:10.440]  So the thing about QuestDB, apart from being open source, we want people to know us because
[00:10.440 --> 00:13.960]  we try to be very performant, but specifically in small machines.
[00:13.960 --> 00:23.600]  It, like, perform very well in 120 CPUs and 200 gigs of RAM, it's okay.
[00:23.600 --> 00:29.200]  Performing very well in 4 CPUs and 16 of RAM, 16 gigs is more difficult.
[00:29.200 --> 00:31.800]  So that we try to optimize for that.
[00:31.800 --> 00:37.320]  Actually in the past, we were optimizing for the larger instance use case and then we realized
[00:37.320 --> 00:41.120]  not everybody has, like, a super large instance at home, so, you know, we try to be better
[00:41.120 --> 00:42.120]  at that.
[00:42.120 --> 00:46.560]  We also try to be very good with developer experience, that you get performance out of
[00:46.560 --> 00:47.560]  the box.
[00:47.560 --> 00:51.520]  There are many things you can tweak in QuestDB, you know, in every other database, every other
[00:51.520 --> 00:59.000]  system, lots of configuration, the, I don't know, the memory, page size, the buffers and
[00:59.000 --> 01:02.080]  what not, which CPUs do what, blah, blah, blah, blah.
[01:02.080 --> 01:05.320]  By default, if you don't touch anything, which will perform well.
[01:05.320 --> 01:08.280]  And then if you have expert tolerance, you might fine tune.
[01:08.280 --> 01:15.000]  But we try hard to make developer experience as simple, and that's why we choose SQL also
[01:15.000 --> 01:16.000]  for querying data.
[01:16.000 --> 01:20.200]  So another time series database, make the trade off.
[01:20.200 --> 01:21.200]  We want to perform.
[01:21.200 --> 01:26.240]  We need to use a different language, which is cool because, you know, that's, I get it.
[01:26.240 --> 01:31.880]  We choose SQL because we want the developers to have an easy way learning QuestDB.
[01:31.880 --> 01:35.560]  For ingesting data, you can use SQL, but we also offer a different protocol, which
[01:35.560 --> 01:36.560]  is faster.
[01:36.560 --> 01:41.000]  That's why we have collecting libraries, so you don't have to go low level to be performant.
[01:41.000 --> 01:42.000]  But that's the idea.
[01:42.000 --> 01:44.440]  And we are open source, very proud about being open source.
[01:44.440 --> 01:46.640]  But why we are building another database?
[01:46.640 --> 01:48.320]  There are a lot of databases.
[01:48.320 --> 01:52.120]  If you walk around first, then you're going to read research about every type of database
[01:52.120 --> 01:53.120]  out there.
[01:53.120 --> 01:57.840]  And just today here, I saw MongoDB, I saw Clickhouse, there's someone about Postgres,
[01:57.840 --> 02:00.360]  there's someone about SQL, about MariaDB.
[02:00.360 --> 02:02.640]  Why you need another database, another open source database?
[02:02.640 --> 02:07.440]  Well, because different data looks different and can have different problems.
[02:07.440 --> 02:10.960]  And in our case, we are specialized on time series.
[02:10.960 --> 02:11.960]  We don't do anything else.
[02:11.960 --> 02:18.560]  I mean, if you try to use QuestDB for full text search analytics, we are truly the worst
[02:18.560 --> 02:21.120]  database ever for that.
[02:21.120 --> 02:27.200]  If you try to use QuestDB for geospatial queries, we support some geospatial themes
[02:27.200 --> 02:29.240]  kind of a bit.
[02:29.240 --> 02:34.760]  We have a specific data type about geohasses, so we have a type about that.
[02:34.760 --> 02:39.720]  But we are not good for geospatial unless it is part of time series plus geo.
[02:39.720 --> 02:40.720]  That's kind of the idea.
[02:40.720 --> 02:46.720]  So we specialize only on time series analytics, on data which is changing over time and you
[02:46.720 --> 02:49.120]  want to monitor and track those changes.
[02:49.120 --> 02:50.120]  That's the idea.
[02:50.120 --> 02:51.920]  We are not good for anything else.
[02:51.920 --> 02:55.840]  If you try to use QuestDB for everything, boy, what a disappointment we are going to
[02:55.840 --> 02:56.840]  be.
[02:56.840 --> 02:59.640]  But if you try for time series database, this will be one of the good ones.
[02:59.640 --> 03:00.640]  That's kind of the idea.
[03:00.640 --> 03:05.040]  And that's why we are building QuestDB, because there are a lot of time series data out there.
[03:05.040 --> 03:08.280]  And how do you know if you have a time series tool and I have to hear a lot of things.
[03:08.280 --> 03:10.600]  I want to just read a couple of them.
[03:10.600 --> 03:17.280]  But basically, if most of the time you are reading data on a slice of time, tell me which
[03:17.280 --> 03:20.920]  energy consumption I have over the last minute.
[03:20.920 --> 03:25.760]  Tell me how is the nuclear reactor doing in the past 10 microseconds.
[03:25.760 --> 03:31.640]  Tell me what is the conversion for this user in the past week.
[03:31.640 --> 03:37.520]  Let me know for all the data, I have a moving vehicle, which was the last position I saw
[03:37.520 --> 03:41.960]  it and which was the sensor in this particular point in time.
[03:41.960 --> 03:44.880]  So if you have that, time series can be interesting.
[03:44.880 --> 03:47.880]  So with time series, you have all that type of problems.
[03:47.880 --> 03:52.480]  Data tends to be inserting faster than it reads.
[03:52.480 --> 03:55.360]  Databases, historically, have been optimized for reads.
[03:55.360 --> 03:59.000]  They try every trick in the book for making read super fast.
[03:59.000 --> 04:03.720]  When you insert data, you need to define indexes and they are going to index by many different
[04:03.720 --> 04:08.040]  things and they keep caching memory for a lot of things and blah, blah, blah, blah.
[04:08.040 --> 04:12.840]  So reading is the key thing, because usually you read data much more than you write.
[04:12.840 --> 04:17.560]  But also in time series databases, we can support heavy reads on top of that, but we
[04:17.560 --> 04:21.040]  need to support heavy inserts and keep performance of that.
[04:21.040 --> 04:22.760]  We don't use indexes.
[04:22.760 --> 04:25.080]  The performance you're going to see today is with no indexes.
[04:25.080 --> 04:26.080]  We don't need them.
[04:26.080 --> 04:30.480]  We don't want them, because having an index slowed down in gestion.
[04:30.480 --> 04:32.200]  It's a luxury we cannot have.
[04:32.200 --> 04:36.120]  So we have some kind of indexing, but we don't have indexes, not as you know them.
[04:36.120 --> 04:37.680]  That's kind of the idea here.
[04:37.680 --> 04:39.160]  So it's slightly different.
[04:39.160 --> 04:45.840]  You have data that you are writing very often, that data is going to grow, and it can grow
[04:45.840 --> 04:46.840]  fast.
[04:46.840 --> 04:50.920]  And you need to have some way of loading or deleting that data.
[04:50.920 --> 04:55.960]  On a traditional database, you just don't say, oh, I have, I don't know, I'm Amazon
[04:55.960 --> 04:57.440]  and I'm getting users.
[04:57.440 --> 05:01.480]  It's like, oh, I already have a million users, a million one, I'm going to delete the old
[05:01.480 --> 05:02.480]  users.
[05:02.480 --> 05:03.480]  You don't do that.
[05:03.480 --> 05:04.480]  I mean, sometimes you do, but you don't do that.
[05:04.480 --> 05:06.880]  You don't really do that on your databases.
[05:06.880 --> 05:12.000]  On time series database, almost all of them have some mechanism to deal with historical
[05:12.000 --> 05:14.240]  data and do something with that.
[05:14.240 --> 05:17.920]  In our case, you can amount partitions, you can amount to cheaper storage, those kind
[05:17.920 --> 05:18.920]  of things.
[05:18.920 --> 05:22.800]  But we have the commands and it is designed for that kind of thing.
[05:22.800 --> 05:23.800]  That kind of the idea.
[05:23.800 --> 05:28.640]  Many other things about how you have a time series storyline, but that kind of the idea.
[05:28.640 --> 05:33.520]  But better than me just telling you, I'm going to show you some queries on top of demo
[05:33.520 --> 05:34.520]  data sets.
[05:34.520 --> 05:38.560]  I'm going to get the feeling why a time series database might be interesting and then we're
[05:38.560 --> 05:42.120]  going to details about the ingesting data and about all those things.
[05:42.120 --> 05:43.800]  That's in sound good so far, yeah?
[05:43.800 --> 05:44.800]  Do you have any questions?
[05:44.800 --> 05:49.240]  I'm happy to take them during the talk, by the way, not only at the end.
[05:49.240 --> 05:59.800]  So we have a live demo, demo.questdbe.io, which is running on a large machine on AWS.
[05:59.800 --> 06:03.240]  We don't need all the power, but since it's like, you know, open to the public.
[06:03.240 --> 06:06.840]  Again, we have a few different data sets.
[06:06.840 --> 06:07.840]  There is one.
[06:07.840 --> 06:13.320]  You are in a big data room, so you are truly familiar with the taxi rise, New York City
[06:13.320 --> 06:14.880]  taxi rise data set.
[06:14.880 --> 06:19.120]  It's the, and the city of New York has a data set, which is very cool for machine learning
[06:19.120 --> 06:24.640]  and for big data, which is taxi rides in the city of New York.
[06:24.640 --> 06:30.440]  When the ride started, when it finished, also the coordinate and a few things like the
[06:30.440 --> 06:33.920]  tip and the amount of the fare, how many people, blah, blah, blah.
[06:33.920 --> 06:39.840]  So we took that open data set and we just put it here on questdbe, a few years of the
[06:39.840 --> 06:40.840]  data set.
[06:40.840 --> 06:43.280]  Yes, you know, a lot of columns here.
[06:43.280 --> 06:47.200]  So let me just show you how big this is.
[06:47.200 --> 06:53.040]  This is, right now, is the size okay or maybe not?
[06:53.040 --> 06:58.200]  Maybe I have to make it a bit, first this a bit bigger and then, okay.
[06:58.200 --> 07:00.880]  So it's 1.6 billion rows, which is not huge.
[07:00.880 --> 07:08.320]  I mean, if you have a relational database, 1.6 billion rows, they don't, relational
[07:08.320 --> 07:10.720]  databases today, they are great.
[07:10.720 --> 07:17.080]  But 1.6 billion rows is like, yeah, I couldn't work with that, I'm not super comfortable.
[07:17.080 --> 07:18.080]  For us, it's cute.
[07:18.080 --> 07:24.760]  It's like, I mean, it's a data set which is respectable but not really huge, but 1.60
[07:24.760 --> 07:26.160]  billion rows.
[07:26.160 --> 07:30.480]  And now, what if I want to do something like, for example, I don't know, I want to calculate
[07:30.480 --> 07:39.680]  the average of whichever, this for example, this number, I want to average the fair amount
[07:39.680 --> 07:42.800]  over 1.6 billion trips.
[07:42.800 --> 07:49.160]  How long you will expect your database to take to go do a full scan over 1.6 billion
[07:49.160 --> 07:53.680]  rows and compute the average, no indexes, no anything.
[07:53.680 --> 07:59.800]  How long would you say, more or less, ballpark, 1.6 billion rows, no one?
[07:59.800 --> 08:04.080]  How is the size in gigabytes, megabytes?
[08:04.080 --> 08:09.400]  I don't know for the whole data set, but this is a double, I mean, I really just know, it's
[08:09.400 --> 08:10.400]  big, it's big.
[08:10.400 --> 08:17.000]  When you download the CSV, it's CSV, it's about 600 megabytes and you have several of those.
[08:17.000 --> 08:20.720]  It's in the, you know, it's largesse.
[08:20.720 --> 08:26.720]  But anyway, well, actually it was slower than I thought.
[08:26.720 --> 08:32.080]  It took, usually it takes half a second, this time it took 0.6 seconds.
[08:32.080 --> 08:36.560]  I know it's slow, I know, but it's with a reason, sort of that.
[08:36.560 --> 08:40.160]  But I told you, I told you, we are trying to see this database, we are super slow for
[08:40.160 --> 08:41.160]  other things.
[08:41.160 --> 08:45.520]  This is not a time series query, did you see any timestamp here, I didn't see anything.
[08:45.520 --> 08:50.520]  This is just a full scan, we parallelize, we read data and we are slow.
[08:50.520 --> 08:56.320]  We take almost over half a second to go over only 1.6 billion rows, unforgivable, sort
[08:56.320 --> 08:57.320]  of that.
[08:57.320 --> 09:03.160]  But there with me here, no, that's the thing, I mean, I'm kind of half kidding but not really.
[09:03.160 --> 09:13.680]  But wait until I put a time dimension, now yes, I want only, for example, I want only
[09:13.680 --> 09:19.240]  one year of data and I'm going to just also add another computation because I know that
[09:19.240 --> 09:23.080]  it's just counting data which is super fast.
[09:23.080 --> 09:27.000]  So I'm going to add another computation, so I'm going to count the data and only for
[09:27.000 --> 09:34.840]  2016 and this is better, this is already 100 milliseconds because we are going only
[09:34.840 --> 09:44.200]  over a few rows, we are going only about, yeah, it's only 146 million rows, this is
[09:44.200 --> 09:48.320]  much more manageable, so only 140 million rows, that's better.
[09:48.320 --> 09:53.960]  So we can go actually very fast on this and then if you keep going down, oh no, I want
[09:53.960 --> 10:04.160]  only one month of data which is, I don't know, still, yeah, 12 million rows, so a month
[10:04.160 --> 10:10.680]  of data is 60 milliseconds, for one day of data, of course, is way faster, this is already
[10:10.680 --> 10:18.360]  50 milliseconds, if I go to one specific hour, a minute, it should be, you know, kind of,
[10:18.360 --> 10:27.560]  not much faster because, oh yeah, it's under one millisecond actually, thank you for that,
[10:27.560 --> 10:34.280]  but still, like, you know, we have partitions, so basically one thing we do, we only go to
[10:34.280 --> 10:38.240]  the partition where the data is stored, so we only attack that part of the data, but
[10:38.240 --> 10:41.960]  that's kind of the thing, for when you have like that time component, we are quite fast,
[10:41.960 --> 10:46.880]  oh, fairly fast, that's kind of the beauty for a time-serious database, and we can do
[10:46.880 --> 10:53.920]  also interesting, other interesting things, if I go to the same table and I show you what
[10:53.920 --> 10:59.760]  this looks like, you can see that for the same second, I have many trips because this
[10:59.760 --> 11:04.960]  is New York, baby, and in New York, you know, the city that never sleeps, you can't get
[11:04.960 --> 11:10.080]  back in every corner, you get rich when you land in New York, I spent there one year,
[11:10.080 --> 11:16.240]  it's not like that, anyway, so in every particular second, even at midnight, you have always a
[11:16.240 --> 11:21.320]  few trips at least, okay, so actually you could do that, we could do something like,
[11:21.320 --> 11:29.680]  I want to know the, I want to, if I want to do something like, give me the date time
[11:29.680 --> 11:45.120]  and how many trips are ending where this date time is in, for example, June 21st, city,
[11:45.120 --> 11:47.720]  what are you doing there, man?
[11:47.720 --> 11:55.920]  I didn't even know I had city here, okay, so, I don't know, for example, in this particular
[11:55.920 --> 12:03.880]  minute, in one particular day, I want to sample in one second interval and know how many trips
[12:03.880 --> 12:07.240]  I have for every particular second, so that's another thing you can do in a time series
[12:07.240 --> 12:12.440]  database, rather than grouping by columns that you can also do, you can group by time,
[12:12.440 --> 12:21.680]  you call this sample by, so we can sample by any, we go from microsecond to year, I
[12:21.680 --> 12:27.400]  guess, microsecond to year, so you can group by microsecond, millisecond, second, year,
[12:27.400 --> 12:31.160]  day, whatever, so in this case, I'm saying, okay, in this particular second, I have six
[12:31.160 --> 12:35.280]  trips and five trips and blah, blah, blah, you get the idea, yeah, so something I wanted
[12:35.280 --> 12:41.280]  to show you, which is another cool one, it's, I have this data set with several trips every
[12:41.280 --> 12:48.440]  second, I have another data set, also with data from Manhattan, is the weather data set,
[12:48.440 --> 12:54.600]  so maybe it will be interesting to know, to join those two data sets, it will be cool
[12:54.600 --> 12:59.880]  to know the weather that I had for a particular trip, because maybe that gives me some insight,
[12:59.880 --> 13:05.520]  I don't know, the challenge is this data set, of course, is real life, it's a different
[13:05.520 --> 13:12.240]  open data set, it's not at the same resolution, we don't have weather changes every second,
[13:12.240 --> 13:16.960]  in my hometown sometimes that happens, and when I was living in London that was crazy,
[13:16.960 --> 13:23.800]  but in real life, we don't measure, we don't store weather changes every second, in this
[13:23.800 --> 13:29.560]  particular data set, we have about two or three records every hour, so now if I want
[13:29.560 --> 13:35.680]  to join a data set with sub-second resolution, a data set with sub-hour resolution, and I
[13:35.680 --> 13:40.840]  want to do a join, if I want to do it in other databases, I could do it, it will take me
[13:40.840 --> 13:45.720]  a while, then I will think I have it and I wouldn't, and then it will be like, yeah,
[13:45.720 --> 13:49.600]  this makes sense, or not really, and a week later I will be crying, I don't know, I don't
[13:49.600 --> 13:55.240]  know, so you know, I should know, so one thing, one cool thing we have here, we have a demo
[13:55.240 --> 14:00.960]  set, it's an example, I'm going to move on to another thing really quickly, because otherwise,
[14:00.960 --> 14:07.760]  but this one I really like, we have a special type of join, which we call an ask of join,
[14:07.760 --> 14:13.440]  which basically does this, I'm going to select the data from the table I told you already
[14:13.440 --> 14:18.600]  for one particular day in time, and then I'm going to do what we call an ask of join, which
[14:18.600 --> 14:25.400]  basically says, this table has a time stamp, we call it the designated time stamp, you
[14:25.400 --> 14:30.160]  design which is the column, you have several, so we have the designated time stamp in one,
[14:30.160 --> 14:34.840]  designated time stamp in the other, joined by the ones that are closer to each other,
[14:34.840 --> 14:38.800]  in this case, ask of means the one which is exactly the same, or immediately before me,
[14:38.800 --> 14:44.720]  the one which is closer to me, what happened before, we have also the one strictly before
[14:44.720 --> 14:48.360]  me cannot be the same, but that's the idea, so in this case for joining two different
[14:48.360 --> 14:54.920]  data sets, I can just do that, also I'm going to add here the time stamp for the other table,
[14:54.920 --> 15:02.480]  so it's clear, so if I run this query, now here I can see for each record on the New
[15:02.480 --> 15:08.560]  York taxi rides, I'm always getting the same time stamp in the weather data set, because
[15:08.560 --> 15:14.960]  I have only one entry every 40 or 45 minutes, if I move to a different point in the day
[15:14.960 --> 15:22.520]  to this day, but instead of at 12, at 12.55 for example, I should see already the time
[15:22.520 --> 15:28.320]  matching to a different entry on this table, but that's it, I have different resolutions,
[15:28.320 --> 15:31.680]  I don't care which one, we join by time, because we're about time, that's kind of the
[15:31.680 --> 15:35.800]  idea, that's what I'm trying to say, I have more interesting queries, but maybe for a
[15:35.800 --> 15:38.840]  different day, so that's the first thing.
[15:38.840 --> 15:42.880]  So I told you, okay, now you get the idea why tensile is kind of interesting, the kind
[15:42.880 --> 15:47.160]  of things we can do, down sampling, all those things, machine learning is very important,
[15:47.160 --> 15:51.680]  you have data maybe every second, and then you want to do a forecasting, and it doesn't
[15:51.680 --> 15:57.000]  make sense to train a model with every second data in many cases, maybe you want to down
[15:57.000 --> 16:01.040]  sample to 15 minutes intervals, with this trick you can do it easily, so that's kind
[16:01.040 --> 16:02.040]  of the idea.
[16:02.040 --> 16:07.080]  So I was speaking about ingesting data, so ingesting over one million times per second
[16:07.080 --> 16:13.560]  on a single instance, it's interesting, but ingesting over one million records per second
[16:13.560 --> 16:19.640]  on a single instance, it's easy actually, I could just write to a file, a pending line,
[16:19.640 --> 16:24.600]  and that will be it, the interesting bit is actually being able to ingest data while you
[16:24.600 --> 16:28.680]  are able to query data in real time, the same data you ingested, that's the trick, because
[16:28.680 --> 16:33.080]  just ingesting, I mean, you put it there and you're like, why ingesting a million records,
[16:33.080 --> 16:36.640]  when you think about it, it's like, well, wait, but how long I have to wait to query
[16:36.640 --> 16:41.360]  the data, and when I can, so the idea is you can query the data at the same time, all benchmarks
[16:41.360 --> 16:45.640]  are lies, of course, on the same benchmark that I'm going to tell you, other people will
[16:45.640 --> 16:50.200]  tell you the contrary, and I'm totally fine with that, but a couple of years ago we published
[16:50.200 --> 16:56.280]  an article saying, hey, we can ingest now at 1.4 million, the slides are linked already
[16:56.280 --> 17:03.840]  on the first page, by the way, thank you, so we, our CTO posted about, you know, how
[17:03.840 --> 17:09.040]  we were ingesting 1.4 million records per second, these records were, they have like
[17:09.040 --> 17:14.320]  20 columns, 10 dimensions, 10 strings, and 10 metrics, 10 numbers, so, you know, we could
[17:14.320 --> 17:21.400]  ingest records of 20 columns with 10 strings and 10 numbers, 1.4 million records per second
[17:21.400 --> 17:26.760]  while running queries, which is the other bit, so we were able to scan over 4 million,
[17:26.760 --> 17:31.600]  4 billion records per second, you know, at the same time in relatively small machines,
[17:31.600 --> 17:36.440]  relatively small, so that's kind of the, the idea, okay, and these benchmarks, we didn't
[17:36.440 --> 17:42.760]  write it, it was, there is a benchmark specifically for 10 series databases, as I told you earlier,
[17:42.760 --> 17:46.640]  if you load data in QuestDB, you can load relational data into QuestDB, and you can
[17:46.640 --> 17:51.320]  run queries, you try to run a conventional benchmark on QuestDB, it's going to be super
[17:51.320 --> 17:56.960]  slow, so we are not designed for full text search, we are not designed for, you know,
[17:56.960 --> 18:02.000]  just operations, reading individual records, or doing updating data, we are not designed
[18:02.000 --> 18:07.000]  for that, we can do it, but we are not designed for that, so there is, and also there are
[18:07.000 --> 18:12.520]  10 series databases, so in FluxDB, another open source database, created this benchmark,
[18:12.520 --> 18:17.560]  the TSBS benchmark, which is specifically about 10 series databases, so the queries
[18:17.560 --> 18:21.800]  and the ingestion patterns matches what you would expect from a 10 series database, now
[18:21.800 --> 18:27.640]  it's maintained by time scale, which is another open source database on top of Postgres, and
[18:27.640 --> 18:33.560]  we have our own, you know, there is an adapter for running that on top of QuestDB, and with
[18:33.560 --> 18:37.480]  that benchmark, it's with the one that we are getting those results, so with that particular
[18:37.480 --> 18:41.360]  benchmark, it's the one giving the results, so you know, your mileage might vary, also
[18:41.360 --> 18:45.040]  depending on the hardware, if you try to run the benchmark in the cloud, it's going to
[18:45.040 --> 18:52.600]  be slower, always, because in the cloud, by default, you use on AWS, you use CVS, on
[18:52.600 --> 18:56.960]  WorldCloud, you use the attached storage, it's networking storage, it has latency, because
[18:56.960 --> 19:00.880]  they are not local disk, they are super cool, but they are not local, it's going to be always
[19:00.880 --> 19:06.680]  slower, you want to get this on WorldCloud or on AWS, you can do it, you have to use
[19:06.680 --> 19:12.560]  NVME disk, which are local disk, which are attached to the instance, but they disappear
[19:12.560 --> 19:18.280]  when you close the instance, but with those disks, you will be getting the same benchmark,
[19:18.280 --> 19:22.600]  so hardware is also important with the benchmark, but that's the idea, you know, that's how
[19:22.600 --> 19:27.440]  we did it, and before, I tell you a bit about the technical decisions, that I will not have
[19:27.440 --> 19:34.800]  super time, but I want to show you how we are doing this in gestion, so let me just,
[19:34.800 --> 19:39.560]  if I can move this out of the way, so this is a scripting goal, I don't know any goal
[19:39.560 --> 19:45.520]  at all, but I know to run this, so another developer advocate, I mean, I couldn't tell
[19:45.520 --> 19:50.760]  you that I know a lot of goals, but I have no idea, so goal lang is a language, so yeah,
[19:50.760 --> 19:57.320]  we have, I've been told it's pretty cool, so we have this library or package or whatever
[19:57.320 --> 20:03.560]  they call it in Go, which is our official package, cargo or whatever, I don't know, so this is
[20:03.560 --> 20:10.320]  my missing languages here, thank you, so yeah, this is our theme, I'm connecting to local
[20:10.320 --> 20:16.600]  host to the default port in QuestDB, I'm going to be simulating data, so I'm simulating IoT
[20:16.600 --> 20:23.240]  data, and I'm going to be outputting a device type, it can be red or blue or green or yellow,
[20:23.240 --> 20:30.160]  I'm going to be outputting duration, latitude, longitude, speed, and time stamp in nanoseconds,
[20:30.160 --> 20:36.200]  and I'm going to do this in chunks of, in batches of 50,000 records, I'm going to do
[20:36.200 --> 20:41.720]  this 200 times, 50,000 records, 200 times, 10 million records, I'm going to be inserting
[20:41.720 --> 20:46.240]  10 million records on a device, on a table that doesn't exist, QuestDB will create it
[20:46.240 --> 20:50.720]  automatically when it starts receiving data, so if I run this scripting goal, which run
[20:50.720 --> 20:57.880]  doing go run, well don't go, so go run, it's ingesting data, it should take less than
[20:57.880 --> 21:02.880]  10 seconds because we are ingesting 10 million, and that's finished, so let me just go to
[21:02.880 --> 21:15.920]  my local host here, let me just select, select how many records did we ingest it for, I have
[21:15.920 --> 21:23.360]  to refresh the tables, okay, how many records I ingested, 10 million records, that's good,
[21:23.360 --> 21:33.120]  can you tell me the interval, so I can see what happened here, sampled by one second,
[21:33.120 --> 21:38.120]  and it's telling me, yeah, you know, in the first second only half a million, because
[21:38.120 --> 21:42.680]  we, we then started at the top of the second, it was probably at second or something, but
[21:42.680 --> 21:48.960]  after that, one million, one million, one million, ten, one, you see, you see the idea,
[21:48.960 --> 21:57.440]  okay, that's not too bad, I can do this slightly better, I can run this script actually twice
[21:57.440 --> 22:04.000]  ingesting in the same instant to two different tables, so now, if I refresh, I should see
[22:04.000 --> 22:10.600]  I have two tables, not only one, so I have two tables here, same hardware and everything,
[22:10.600 --> 22:18.240]  if I run again, I'm going to select only the last 10 rows, so we only see the latest run,
[22:18.240 --> 22:22.920]  so you can see it's just lower now, I was actually ingesting to two tables, so I'm ingesting
[22:22.920 --> 22:29.320]  only 700,000 per second, something like that, but if I go to the same time to the other
[22:29.320 --> 22:37.040]  table, I can just do a union, if I go to the other table here, you should see that at the
[22:37.040 --> 22:45.360]  same time in the, oh yeah, I cannot apply limit here, sorry, in a union, so I should
[22:45.360 --> 22:51.280]  see that, you know, even if I was going slower, the other table was reading data, and in this
[22:51.280 --> 22:55.440]  format you cannot see it very well, but we can do something I told you earlier, I can
[22:55.440 --> 23:04.160]  just rather than do a join, I can just do something like, as of join, the first query
[23:04.160 --> 23:12.840]  with the second, so I should be able to do this, now I have, in the first run, we were
[23:12.840 --> 23:19.080]  running only one instance of sending data, and this one is the one in which I was running
[23:19.080 --> 23:25.000]  two, so you can see, for this particular second, we were ingesting 700,000 records
[23:25.000 --> 23:31.000]  in one, 700,000 records in the other same time, so about 1.4 something million in total
[23:31.000 --> 23:36.720]  because we're in different tables, out of the box, if I configure the writers and how
[23:36.720 --> 23:41.800]  many threads I have for processing things, I can get it slightly faster than this, okay,
[23:41.800 --> 23:47.560]  but that's good enough, on a local, M1 laptop SSD, it's fast, but that's the idea, okay,
[23:47.560 --> 23:51.240]  so that's the one million there, I was not lying, I was just, you know, telling you things,
[23:51.240 --> 23:56.760]  I have only a few minutes, but that's cool, how we got here, first, we can do a lot of
[23:56.760 --> 24:03.480]  assumptions about the data, this is time-serious, so we know people usually want to get not
[24:03.480 --> 24:10.360]  individual rows, but computations over rows, we know people mostly want to group by things
[24:10.360 --> 24:15.840]  that are in the data, like strings, like the country name or the device name or the brand
[24:15.840 --> 24:20.840]  or whatever, so instead of storing strings, we have a special symbol, which is called
[24:20.840 --> 24:25.080]  a special type, which is called a symbol, if you give me a string, we convert into a
[24:25.080 --> 24:29.360]  number and we do look up automatically those things, so we can make a lot of assumptions
[24:29.360 --> 24:34.800]  because we hyper-specialize on one particular use case, we optimize storage, we don't use
[24:34.800 --> 24:40.200]  indexes because we store everything always in incremental order per partition, if we
[24:40.200 --> 24:45.040]  get data out of order, we have to regret the partitions, but we don't need indexes because
[24:45.040 --> 24:49.880]  we always have the data physically in order, so we can scan super quickly back and forth,
[24:49.880 --> 24:55.080]  that's kind of the idea, we also parallelize as much as we can using different things,
[24:55.080 --> 25:00.200]  this is written in Java and it's from scratch, you will see some databases which I love,
[25:00.200 --> 25:05.360]  like MongoDB, excellent database for content, they have a time-serious module, we use the
[25:05.360 --> 25:11.280]  same MongoDB collections for doing time-series, they cannot be as fast because they are using
[25:11.280 --> 25:16.880]  exactly what they are using for content, it's very convenient, I can do everything, but
[25:16.880 --> 25:20.840]  same thing with other engines that are built on top of other things, we don't have any
[25:20.840 --> 25:27.640]  dependencies, everything is built for scratch, actually we are writing some of the libraries
[25:27.640 --> 25:33.800]  in Java like strings and loggers and so on to avoid conversions, there are things that
[25:33.800 --> 25:39.280]  we don't use, so we don't use them, we have libraries for strings, we have libraries for
[25:39.280 --> 25:45.280]  memory management, we have libraries for absolutely everything, they are written in our own version,
[25:45.280 --> 25:50.280]  we had our own Justintine compiler because the original Justintine compiler in Java was
[25:50.280 --> 25:55.400]  not performed enough for some of the parallelization inquiries wanted to do, so we wrote everything,
[25:55.400 --> 26:01.480]  our Java is kind of weird, Jeremy can tell you more about that, it's super weird Java,
[26:01.480 --> 26:07.840]  but it's still Java, that's kind of the idea, we even route our own input output functions,
[26:07.840 --> 26:09.280]  that's kind of a thing, why?
[26:09.280 --> 26:16.400]  Because we can get nanoseconds faster, this is log4j, log4j, we don't speak about log4j,
[26:16.400 --> 26:24.080]  but this is awesome, but you know this is log4j, j for log4j, and this is the nanoseconds,
[26:24.080 --> 26:32.680]  the operations you can do in each nanosecond, so with log4j, login, integer, you can do
[26:32.680 --> 26:40.040]  82 operations per nanosecond, we can do 800 operations per nanosecond, which is, do you
[26:40.040 --> 26:45.480]  have to go down to the nanosecond, if you are doing a CRUT application, probably not,
[26:45.480 --> 26:49.960]  it really depends what you are building, that's kind of why we are writing things from scratch,
[26:49.960 --> 26:55.200]  so basically the approach of QuestDB to performance, you know this, this is like, I don't know
[26:55.200 --> 27:00.880]  who you are, but I don't know you, but I will find you and I will kill you, that's kind
[27:00.880 --> 27:06.800]  of the same approach I see on QuestDB team, they are like, I don't know, we can get faster
[27:06.800 --> 27:12.240]  at some obscure thing here, so that's kind of the idea, and we try to be a good team
[27:12.240 --> 27:20.960]  player, Jeremy here has contributed himself, only alone, the connectors for KafkaConnet,
[27:20.960 --> 27:26.280]  connectors for Apache Flink, so we try to integrate with the rest of the ecosystem, we love it
[27:26.280 --> 27:32.480]  if you try QuestDB, you are open source geeks, you like, we have stars, we like you have
[27:32.480 --> 27:38.160]  stars, please contribute, please start on GitHub if you like it, we have a contributor
[27:38.160 --> 27:44.400]  to the Slack channel, we are quite friendly, we are fast, we work with interesting problems,
[27:44.400 --> 27:49.600]  if you like interesting problems, if you like weird Java, we would love to have you here,
[27:49.600 --> 27:52.240]  so thank you very much, and I can take any questions outside.
[27:52.240 --> 28:10.920]  Oh, one question for the chat, thank you, yeah, yeah, yeah, yeah, it's a, someone was
[28:10.920 --> 28:16.420]  asking, is QuestDB can work with GPS data, yes, you can work with GPS data, we have
[28:16.420 --> 28:23.560]  doubles that we can use for that, we don't have a lot of geospatial functions, we have
[28:23.560 --> 28:30.080]  geohashes, which basically allow you to define in which, at different resolutions, in which
[28:30.080 --> 28:35.720]  square in the world something is, so if you are talking about finding where a point is
[28:35.720 --> 28:40.600]  in the world, at a particular point in time, QuestDB is very cool, if you need to do other
[28:40.600 --> 28:47.400]  things, we support some math libraries, calls and all those things to do your own calculations,
[28:47.400 --> 28:51.440]  but yeah, it can be used for GPS, and some people are, a lot of people are actually doing
[28:51.440 --> 29:13.920]  asset tracking with QuestDB, thank you.