[00:00.000 --> 00:10.440] So the thing about QuestDB, apart from being open source, we want people to know us because [00:10.440 --> 00:13.960] we try to be very performant, but specifically in small machines. [00:13.960 --> 00:23.600] It, like, perform very well in 120 CPUs and 200 gigs of RAM, it's okay. [00:23.600 --> 00:29.200] Performing very well in 4 CPUs and 16 of RAM, 16 gigs is more difficult. [00:29.200 --> 00:31.800] So that we try to optimize for that. [00:31.800 --> 00:37.320] Actually in the past, we were optimizing for the larger instance use case and then we realized [00:37.320 --> 00:41.120] not everybody has, like, a super large instance at home, so, you know, we try to be better [00:41.120 --> 00:42.120] at that. [00:42.120 --> 00:46.560] We also try to be very good with developer experience, that you get performance out of [00:46.560 --> 00:47.560] the box. [00:47.560 --> 00:51.520] There are many things you can tweak in QuestDB, you know, in every other database, every other [00:51.520 --> 00:59.000] system, lots of configuration, the, I don't know, the memory, page size, the buffers and [00:59.000 --> 01:02.080] what not, which CPUs do what, blah, blah, blah, blah. [01:02.080 --> 01:05.320] By default, if you don't touch anything, which will perform well. [01:05.320 --> 01:08.280] And then if you have expert tolerance, you might fine tune. [01:08.280 --> 01:15.000] But we try hard to make developer experience as simple, and that's why we choose SQL also [01:15.000 --> 01:16.000] for querying data. [01:16.000 --> 01:20.200] So another time series database, make the trade off. [01:20.200 --> 01:21.200] We want to perform. [01:21.200 --> 01:26.240] We need to use a different language, which is cool because, you know, that's, I get it. [01:26.240 --> 01:31.880] We choose SQL because we want the developers to have an easy way learning QuestDB. [01:31.880 --> 01:35.560] For ingesting data, you can use SQL, but we also offer a different protocol, which [01:35.560 --> 01:36.560] is faster. [01:36.560 --> 01:41.000] That's why we have collecting libraries, so you don't have to go low level to be performant. [01:41.000 --> 01:42.000] But that's the idea. [01:42.000 --> 01:44.440] And we are open source, very proud about being open source. [01:44.440 --> 01:46.640] But why we are building another database? [01:46.640 --> 01:48.320] There are a lot of databases. [01:48.320 --> 01:52.120] If you walk around first, then you're going to read research about every type of database [01:52.120 --> 01:53.120] out there. [01:53.120 --> 01:57.840] And just today here, I saw MongoDB, I saw Clickhouse, there's someone about Postgres, [01:57.840 --> 02:00.360] there's someone about SQL, about MariaDB. [02:00.360 --> 02:02.640] Why you need another database, another open source database? [02:02.640 --> 02:07.440] Well, because different data looks different and can have different problems. [02:07.440 --> 02:10.960] And in our case, we are specialized on time series. [02:10.960 --> 02:11.960] We don't do anything else. [02:11.960 --> 02:18.560] I mean, if you try to use QuestDB for full text search analytics, we are truly the worst [02:18.560 --> 02:21.120] database ever for that. [02:21.120 --> 02:27.200] If you try to use QuestDB for geospatial queries, we support some geospatial themes [02:27.200 --> 02:29.240] kind of a bit. [02:29.240 --> 02:34.760] We have a specific data type about geohasses, so we have a type about that. [02:34.760 --> 02:39.720] But we are not good for geospatial unless it is part of time series plus geo. [02:39.720 --> 02:40.720] That's kind of the idea. [02:40.720 --> 02:46.720] So we specialize only on time series analytics, on data which is changing over time and you [02:46.720 --> 02:49.120] want to monitor and track those changes. [02:49.120 --> 02:50.120] That's the idea. [02:50.120 --> 02:51.920] We are not good for anything else. [02:51.920 --> 02:55.840] If you try to use QuestDB for everything, boy, what a disappointment we are going to [02:55.840 --> 02:56.840] be. [02:56.840 --> 02:59.640] But if you try for time series database, this will be one of the good ones. [02:59.640 --> 03:00.640] That's kind of the idea. [03:00.640 --> 03:05.040] And that's why we are building QuestDB, because there are a lot of time series data out there. [03:05.040 --> 03:08.280] And how do you know if you have a time series tool and I have to hear a lot of things. [03:08.280 --> 03:10.600] I want to just read a couple of them. [03:10.600 --> 03:17.280] But basically, if most of the time you are reading data on a slice of time, tell me which [03:17.280 --> 03:20.920] energy consumption I have over the last minute. [03:20.920 --> 03:25.760] Tell me how is the nuclear reactor doing in the past 10 microseconds. [03:25.760 --> 03:31.640] Tell me what is the conversion for this user in the past week. [03:31.640 --> 03:37.520] Let me know for all the data, I have a moving vehicle, which was the last position I saw [03:37.520 --> 03:41.960] it and which was the sensor in this particular point in time. [03:41.960 --> 03:44.880] So if you have that, time series can be interesting. [03:44.880 --> 03:47.880] So with time series, you have all that type of problems. [03:47.880 --> 03:52.480] Data tends to be inserting faster than it reads. [03:52.480 --> 03:55.360] Databases, historically, have been optimized for reads. [03:55.360 --> 03:59.000] They try every trick in the book for making read super fast. [03:59.000 --> 04:03.720] When you insert data, you need to define indexes and they are going to index by many different [04:03.720 --> 04:08.040] things and they keep caching memory for a lot of things and blah, blah, blah, blah. [04:08.040 --> 04:12.840] So reading is the key thing, because usually you read data much more than you write. [04:12.840 --> 04:17.560] But also in time series databases, we can support heavy reads on top of that, but we [04:17.560 --> 04:21.040] need to support heavy inserts and keep performance of that. [04:21.040 --> 04:22.760] We don't use indexes. [04:22.760 --> 04:25.080] The performance you're going to see today is with no indexes. [04:25.080 --> 04:26.080] We don't need them. [04:26.080 --> 04:30.480] We don't want them, because having an index slowed down in gestion. [04:30.480 --> 04:32.200] It's a luxury we cannot have. [04:32.200 --> 04:36.120] So we have some kind of indexing, but we don't have indexes, not as you know them. [04:36.120 --> 04:37.680] That's kind of the idea here. [04:37.680 --> 04:39.160] So it's slightly different. [04:39.160 --> 04:45.840] You have data that you are writing very often, that data is going to grow, and it can grow [04:45.840 --> 04:46.840] fast. [04:46.840 --> 04:50.920] And you need to have some way of loading or deleting that data. [04:50.920 --> 04:55.960] On a traditional database, you just don't say, oh, I have, I don't know, I'm Amazon [04:55.960 --> 04:57.440] and I'm getting users. [04:57.440 --> 05:01.480] It's like, oh, I already have a million users, a million one, I'm going to delete the old [05:01.480 --> 05:02.480] users. [05:02.480 --> 05:03.480] You don't do that. [05:03.480 --> 05:04.480] I mean, sometimes you do, but you don't do that. [05:04.480 --> 05:06.880] You don't really do that on your databases. [05:06.880 --> 05:12.000] On time series database, almost all of them have some mechanism to deal with historical [05:12.000 --> 05:14.240] data and do something with that. [05:14.240 --> 05:17.920] In our case, you can amount partitions, you can amount to cheaper storage, those kind [05:17.920 --> 05:18.920] of things. [05:18.920 --> 05:22.800] But we have the commands and it is designed for that kind of thing. [05:22.800 --> 05:23.800] That kind of the idea. [05:23.800 --> 05:28.640] Many other things about how you have a time series storyline, but that kind of the idea. [05:28.640 --> 05:33.520] But better than me just telling you, I'm going to show you some queries on top of demo [05:33.520 --> 05:34.520] data sets. [05:34.520 --> 05:38.560] I'm going to get the feeling why a time series database might be interesting and then we're [05:38.560 --> 05:42.120] going to details about the ingesting data and about all those things. [05:42.120 --> 05:43.800] That's in sound good so far, yeah? [05:43.800 --> 05:44.800] Do you have any questions? [05:44.800 --> 05:49.240] I'm happy to take them during the talk, by the way, not only at the end. [05:49.240 --> 05:59.800] So we have a live demo, demo.questdbe.io, which is running on a large machine on AWS. [05:59.800 --> 06:03.240] We don't need all the power, but since it's like, you know, open to the public. [06:03.240 --> 06:06.840] Again, we have a few different data sets. [06:06.840 --> 06:07.840] There is one. [06:07.840 --> 06:13.320] You are in a big data room, so you are truly familiar with the taxi rise, New York City [06:13.320 --> 06:14.880] taxi rise data set. [06:14.880 --> 06:19.120] It's the, and the city of New York has a data set, which is very cool for machine learning [06:19.120 --> 06:24.640] and for big data, which is taxi rides in the city of New York. [06:24.640 --> 06:30.440] When the ride started, when it finished, also the coordinate and a few things like the [06:30.440 --> 06:33.920] tip and the amount of the fare, how many people, blah, blah, blah. [06:33.920 --> 06:39.840] So we took that open data set and we just put it here on questdbe, a few years of the [06:39.840 --> 06:40.840] data set. [06:40.840 --> 06:43.280] Yes, you know, a lot of columns here. [06:43.280 --> 06:47.200] So let me just show you how big this is. [06:47.200 --> 06:53.040] This is, right now, is the size okay or maybe not? [06:53.040 --> 06:58.200] Maybe I have to make it a bit, first this a bit bigger and then, okay. [06:58.200 --> 07:00.880] So it's 1.6 billion rows, which is not huge. [07:00.880 --> 07:08.320] I mean, if you have a relational database, 1.6 billion rows, they don't, relational [07:08.320 --> 07:10.720] databases today, they are great. [07:10.720 --> 07:17.080] But 1.6 billion rows is like, yeah, I couldn't work with that, I'm not super comfortable. [07:17.080 --> 07:18.080] For us, it's cute. [07:18.080 --> 07:24.760] It's like, I mean, it's a data set which is respectable but not really huge, but 1.60 [07:24.760 --> 07:26.160] billion rows. [07:26.160 --> 07:30.480] And now, what if I want to do something like, for example, I don't know, I want to calculate [07:30.480 --> 07:39.680] the average of whichever, this for example, this number, I want to average the fair amount [07:39.680 --> 07:42.800] over 1.6 billion trips. [07:42.800 --> 07:49.160] How long you will expect your database to take to go do a full scan over 1.6 billion [07:49.160 --> 07:53.680] rows and compute the average, no indexes, no anything. [07:53.680 --> 07:59.800] How long would you say, more or less, ballpark, 1.6 billion rows, no one? [07:59.800 --> 08:04.080] How is the size in gigabytes, megabytes? [08:04.080 --> 08:09.400] I don't know for the whole data set, but this is a double, I mean, I really just know, it's [08:09.400 --> 08:10.400] big, it's big. [08:10.400 --> 08:17.000] When you download the CSV, it's CSV, it's about 600 megabytes and you have several of those. [08:17.000 --> 08:20.720] It's in the, you know, it's largesse. [08:20.720 --> 08:26.720] But anyway, well, actually it was slower than I thought. [08:26.720 --> 08:32.080] It took, usually it takes half a second, this time it took 0.6 seconds. [08:32.080 --> 08:36.560] I know it's slow, I know, but it's with a reason, sort of that. [08:36.560 --> 08:40.160] But I told you, I told you, we are trying to see this database, we are super slow for [08:40.160 --> 08:41.160] other things. [08:41.160 --> 08:45.520] This is not a time series query, did you see any timestamp here, I didn't see anything. [08:45.520 --> 08:50.520] This is just a full scan, we parallelize, we read data and we are slow. [08:50.520 --> 08:56.320] We take almost over half a second to go over only 1.6 billion rows, unforgivable, sort [08:56.320 --> 08:57.320] of that. [08:57.320 --> 09:03.160] But there with me here, no, that's the thing, I mean, I'm kind of half kidding but not really. [09:03.160 --> 09:13.680] But wait until I put a time dimension, now yes, I want only, for example, I want only [09:13.680 --> 09:19.240] one year of data and I'm going to just also add another computation because I know that [09:19.240 --> 09:23.080] it's just counting data which is super fast. [09:23.080 --> 09:27.000] So I'm going to add another computation, so I'm going to count the data and only for [09:27.000 --> 09:34.840] 2016 and this is better, this is already 100 milliseconds because we are going only [09:34.840 --> 09:44.200] over a few rows, we are going only about, yeah, it's only 146 million rows, this is [09:44.200 --> 09:48.320] much more manageable, so only 140 million rows, that's better. [09:48.320 --> 09:53.960] So we can go actually very fast on this and then if you keep going down, oh no, I want [09:53.960 --> 10:04.160] only one month of data which is, I don't know, still, yeah, 12 million rows, so a month [10:04.160 --> 10:10.680] of data is 60 milliseconds, for one day of data, of course, is way faster, this is already [10:10.680 --> 10:18.360] 50 milliseconds, if I go to one specific hour, a minute, it should be, you know, kind of, [10:18.360 --> 10:27.560] not much faster because, oh yeah, it's under one millisecond actually, thank you for that, [10:27.560 --> 10:34.280] but still, like, you know, we have partitions, so basically one thing we do, we only go to [10:34.280 --> 10:38.240] the partition where the data is stored, so we only attack that part of the data, but [10:38.240 --> 10:41.960] that's kind of the thing, for when you have like that time component, we are quite fast, [10:41.960 --> 10:46.880] oh, fairly fast, that's kind of the beauty for a time-serious database, and we can do [10:46.880 --> 10:53.920] also interesting, other interesting things, if I go to the same table and I show you what [10:53.920 --> 10:59.760] this looks like, you can see that for the same second, I have many trips because this [10:59.760 --> 11:04.960] is New York, baby, and in New York, you know, the city that never sleeps, you can't get [11:04.960 --> 11:10.080] back in every corner, you get rich when you land in New York, I spent there one year, [11:10.080 --> 11:16.240] it's not like that, anyway, so in every particular second, even at midnight, you have always a [11:16.240 --> 11:21.320] few trips at least, okay, so actually you could do that, we could do something like, [11:21.320 --> 11:29.680] I want to know the, I want to, if I want to do something like, give me the date time [11:29.680 --> 11:45.120] and how many trips are ending where this date time is in, for example, June 21st, city, [11:45.120 --> 11:47.720] what are you doing there, man? [11:47.720 --> 11:55.920] I didn't even know I had city here, okay, so, I don't know, for example, in this particular [11:55.920 --> 12:03.880] minute, in one particular day, I want to sample in one second interval and know how many trips [12:03.880 --> 12:07.240] I have for every particular second, so that's another thing you can do in a time series [12:07.240 --> 12:12.440] database, rather than grouping by columns that you can also do, you can group by time, [12:12.440 --> 12:21.680] you call this sample by, so we can sample by any, we go from microsecond to year, I [12:21.680 --> 12:27.400] guess, microsecond to year, so you can group by microsecond, millisecond, second, year, [12:27.400 --> 12:31.160] day, whatever, so in this case, I'm saying, okay, in this particular second, I have six [12:31.160 --> 12:35.280] trips and five trips and blah, blah, blah, you get the idea, yeah, so something I wanted [12:35.280 --> 12:41.280] to show you, which is another cool one, it's, I have this data set with several trips every [12:41.280 --> 12:48.440] second, I have another data set, also with data from Manhattan, is the weather data set, [12:48.440 --> 12:54.600] so maybe it will be interesting to know, to join those two data sets, it will be cool [12:54.600 --> 12:59.880] to know the weather that I had for a particular trip, because maybe that gives me some insight, [12:59.880 --> 13:05.520] I don't know, the challenge is this data set, of course, is real life, it's a different [13:05.520 --> 13:12.240] open data set, it's not at the same resolution, we don't have weather changes every second, [13:12.240 --> 13:16.960] in my hometown sometimes that happens, and when I was living in London that was crazy, [13:16.960 --> 13:23.800] but in real life, we don't measure, we don't store weather changes every second, in this [13:23.800 --> 13:29.560] particular data set, we have about two or three records every hour, so now if I want [13:29.560 --> 13:35.680] to join a data set with sub-second resolution, a data set with sub-hour resolution, and I [13:35.680 --> 13:40.840] want to do a join, if I want to do it in other databases, I could do it, it will take me [13:40.840 --> 13:45.720] a while, then I will think I have it and I wouldn't, and then it will be like, yeah, [13:45.720 --> 13:49.600] this makes sense, or not really, and a week later I will be crying, I don't know, I don't [13:49.600 --> 13:55.240] know, so you know, I should know, so one thing, one cool thing we have here, we have a demo [13:55.240 --> 14:00.960] set, it's an example, I'm going to move on to another thing really quickly, because otherwise, [14:00.960 --> 14:07.760] but this one I really like, we have a special type of join, which we call an ask of join, [14:07.760 --> 14:13.440] which basically does this, I'm going to select the data from the table I told you already [14:13.440 --> 14:18.600] for one particular day in time, and then I'm going to do what we call an ask of join, which [14:18.600 --> 14:25.400] basically says, this table has a time stamp, we call it the designated time stamp, you [14:25.400 --> 14:30.160] design which is the column, you have several, so we have the designated time stamp in one, [14:30.160 --> 14:34.840] designated time stamp in the other, joined by the ones that are closer to each other, [14:34.840 --> 14:38.800] in this case, ask of means the one which is exactly the same, or immediately before me, [14:38.800 --> 14:44.720] the one which is closer to me, what happened before, we have also the one strictly before [14:44.720 --> 14:48.360] me cannot be the same, but that's the idea, so in this case for joining two different [14:48.360 --> 14:54.920] data sets, I can just do that, also I'm going to add here the time stamp for the other table, [14:54.920 --> 15:02.480] so it's clear, so if I run this query, now here I can see for each record on the New [15:02.480 --> 15:08.560] York taxi rides, I'm always getting the same time stamp in the weather data set, because [15:08.560 --> 15:14.960] I have only one entry every 40 or 45 minutes, if I move to a different point in the day [15:14.960 --> 15:22.520] to this day, but instead of at 12, at 12.55 for example, I should see already the time [15:22.520 --> 15:28.320] matching to a different entry on this table, but that's it, I have different resolutions, [15:28.320 --> 15:31.680] I don't care which one, we join by time, because we're about time, that's kind of the [15:31.680 --> 15:35.800] idea, that's what I'm trying to say, I have more interesting queries, but maybe for a [15:35.800 --> 15:38.840] different day, so that's the first thing. [15:38.840 --> 15:42.880] So I told you, okay, now you get the idea why tensile is kind of interesting, the kind [15:42.880 --> 15:47.160] of things we can do, down sampling, all those things, machine learning is very important, [15:47.160 --> 15:51.680] you have data maybe every second, and then you want to do a forecasting, and it doesn't [15:51.680 --> 15:57.000] make sense to train a model with every second data in many cases, maybe you want to down [15:57.000 --> 16:01.040] sample to 15 minutes intervals, with this trick you can do it easily, so that's kind [16:01.040 --> 16:02.040] of the idea. [16:02.040 --> 16:07.080] So I was speaking about ingesting data, so ingesting over one million times per second [16:07.080 --> 16:13.560] on a single instance, it's interesting, but ingesting over one million records per second [16:13.560 --> 16:19.640] on a single instance, it's easy actually, I could just write to a file, a pending line, [16:19.640 --> 16:24.600] and that will be it, the interesting bit is actually being able to ingest data while you [16:24.600 --> 16:28.680] are able to query data in real time, the same data you ingested, that's the trick, because [16:28.680 --> 16:33.080] just ingesting, I mean, you put it there and you're like, why ingesting a million records, [16:33.080 --> 16:36.640] when you think about it, it's like, well, wait, but how long I have to wait to query [16:36.640 --> 16:41.360] the data, and when I can, so the idea is you can query the data at the same time, all benchmarks [16:41.360 --> 16:45.640] are lies, of course, on the same benchmark that I'm going to tell you, other people will [16:45.640 --> 16:50.200] tell you the contrary, and I'm totally fine with that, but a couple of years ago we published [16:50.200 --> 16:56.280] an article saying, hey, we can ingest now at 1.4 million, the slides are linked already [16:56.280 --> 17:03.840] on the first page, by the way, thank you, so we, our CTO posted about, you know, how [17:03.840 --> 17:09.040] we were ingesting 1.4 million records per second, these records were, they have like [17:09.040 --> 17:14.320] 20 columns, 10 dimensions, 10 strings, and 10 metrics, 10 numbers, so, you know, we could [17:14.320 --> 17:21.400] ingest records of 20 columns with 10 strings and 10 numbers, 1.4 million records per second [17:21.400 --> 17:26.760] while running queries, which is the other bit, so we were able to scan over 4 million, [17:26.760 --> 17:31.600] 4 billion records per second, you know, at the same time in relatively small machines, [17:31.600 --> 17:36.440] relatively small, so that's kind of the, the idea, okay, and these benchmarks, we didn't [17:36.440 --> 17:42.760] write it, it was, there is a benchmark specifically for 10 series databases, as I told you earlier, [17:42.760 --> 17:46.640] if you load data in QuestDB, you can load relational data into QuestDB, and you can [17:46.640 --> 17:51.320] run queries, you try to run a conventional benchmark on QuestDB, it's going to be super [17:51.320 --> 17:56.960] slow, so we are not designed for full text search, we are not designed for, you know, [17:56.960 --> 18:02.000] just operations, reading individual records, or doing updating data, we are not designed [18:02.000 --> 18:07.000] for that, we can do it, but we are not designed for that, so there is, and also there are [18:07.000 --> 18:12.520] 10 series databases, so in FluxDB, another open source database, created this benchmark, [18:12.520 --> 18:17.560] the TSBS benchmark, which is specifically about 10 series databases, so the queries [18:17.560 --> 18:21.800] and the ingestion patterns matches what you would expect from a 10 series database, now [18:21.800 --> 18:27.640] it's maintained by time scale, which is another open source database on top of Postgres, and [18:27.640 --> 18:33.560] we have our own, you know, there is an adapter for running that on top of QuestDB, and with [18:33.560 --> 18:37.480] that benchmark, it's with the one that we are getting those results, so with that particular [18:37.480 --> 18:41.360] benchmark, it's the one giving the results, so you know, your mileage might vary, also [18:41.360 --> 18:45.040] depending on the hardware, if you try to run the benchmark in the cloud, it's going to [18:45.040 --> 18:52.600] be slower, always, because in the cloud, by default, you use on AWS, you use CVS, on [18:52.600 --> 18:56.960] WorldCloud, you use the attached storage, it's networking storage, it has latency, because [18:56.960 --> 19:00.880] they are not local disk, they are super cool, but they are not local, it's going to be always [19:00.880 --> 19:06.680] slower, you want to get this on WorldCloud or on AWS, you can do it, you have to use [19:06.680 --> 19:12.560] NVME disk, which are local disk, which are attached to the instance, but they disappear [19:12.560 --> 19:18.280] when you close the instance, but with those disks, you will be getting the same benchmark, [19:18.280 --> 19:22.600] so hardware is also important with the benchmark, but that's the idea, you know, that's how [19:22.600 --> 19:27.440] we did it, and before, I tell you a bit about the technical decisions, that I will not have [19:27.440 --> 19:34.800] super time, but I want to show you how we are doing this in gestion, so let me just, [19:34.800 --> 19:39.560] if I can move this out of the way, so this is a scripting goal, I don't know any goal [19:39.560 --> 19:45.520] at all, but I know to run this, so another developer advocate, I mean, I couldn't tell [19:45.520 --> 19:50.760] you that I know a lot of goals, but I have no idea, so goal lang is a language, so yeah, [19:50.760 --> 19:57.320] we have, I've been told it's pretty cool, so we have this library or package or whatever [19:57.320 --> 20:03.560] they call it in Go, which is our official package, cargo or whatever, I don't know, so this is [20:03.560 --> 20:10.320] my missing languages here, thank you, so yeah, this is our theme, I'm connecting to local [20:10.320 --> 20:16.600] host to the default port in QuestDB, I'm going to be simulating data, so I'm simulating IoT [20:16.600 --> 20:23.240] data, and I'm going to be outputting a device type, it can be red or blue or green or yellow, [20:23.240 --> 20:30.160] I'm going to be outputting duration, latitude, longitude, speed, and time stamp in nanoseconds, [20:30.160 --> 20:36.200] and I'm going to do this in chunks of, in batches of 50,000 records, I'm going to do [20:36.200 --> 20:41.720] this 200 times, 50,000 records, 200 times, 10 million records, I'm going to be inserting [20:41.720 --> 20:46.240] 10 million records on a device, on a table that doesn't exist, QuestDB will create it [20:46.240 --> 20:50.720] automatically when it starts receiving data, so if I run this scripting goal, which run [20:50.720 --> 20:57.880] doing go run, well don't go, so go run, it's ingesting data, it should take less than [20:57.880 --> 21:02.880] 10 seconds because we are ingesting 10 million, and that's finished, so let me just go to [21:02.880 --> 21:15.920] my local host here, let me just select, select how many records did we ingest it for, I have [21:15.920 --> 21:23.360] to refresh the tables, okay, how many records I ingested, 10 million records, that's good, [21:23.360 --> 21:33.120] can you tell me the interval, so I can see what happened here, sampled by one second, [21:33.120 --> 21:38.120] and it's telling me, yeah, you know, in the first second only half a million, because [21:38.120 --> 21:42.680] we, we then started at the top of the second, it was probably at second or something, but [21:42.680 --> 21:48.960] after that, one million, one million, one million, ten, one, you see, you see the idea, [21:48.960 --> 21:57.440] okay, that's not too bad, I can do this slightly better, I can run this script actually twice [21:57.440 --> 22:04.000] ingesting in the same instant to two different tables, so now, if I refresh, I should see [22:04.000 --> 22:10.600] I have two tables, not only one, so I have two tables here, same hardware and everything, [22:10.600 --> 22:18.240] if I run again, I'm going to select only the last 10 rows, so we only see the latest run, [22:18.240 --> 22:22.920] so you can see it's just lower now, I was actually ingesting to two tables, so I'm ingesting [22:22.920 --> 22:29.320] only 700,000 per second, something like that, but if I go to the same time to the other [22:29.320 --> 22:37.040] table, I can just do a union, if I go to the other table here, you should see that at the [22:37.040 --> 22:45.360] same time in the, oh yeah, I cannot apply limit here, sorry, in a union, so I should [22:45.360 --> 22:51.280] see that, you know, even if I was going slower, the other table was reading data, and in this [22:51.280 --> 22:55.440] format you cannot see it very well, but we can do something I told you earlier, I can [22:55.440 --> 23:04.160] just rather than do a join, I can just do something like, as of join, the first query [23:04.160 --> 23:12.840] with the second, so I should be able to do this, now I have, in the first run, we were [23:12.840 --> 23:19.080] running only one instance of sending data, and this one is the one in which I was running [23:19.080 --> 23:25.000] two, so you can see, for this particular second, we were ingesting 700,000 records [23:25.000 --> 23:31.000] in one, 700,000 records in the other same time, so about 1.4 something million in total [23:31.000 --> 23:36.720] because we're in different tables, out of the box, if I configure the writers and how [23:36.720 --> 23:41.800] many threads I have for processing things, I can get it slightly faster than this, okay, [23:41.800 --> 23:47.560] but that's good enough, on a local, M1 laptop SSD, it's fast, but that's the idea, okay, [23:47.560 --> 23:51.240] so that's the one million there, I was not lying, I was just, you know, telling you things, [23:51.240 --> 23:56.760] I have only a few minutes, but that's cool, how we got here, first, we can do a lot of [23:56.760 --> 24:03.480] assumptions about the data, this is time-serious, so we know people usually want to get not [24:03.480 --> 24:10.360] individual rows, but computations over rows, we know people mostly want to group by things [24:10.360 --> 24:15.840] that are in the data, like strings, like the country name or the device name or the brand [24:15.840 --> 24:20.840] or whatever, so instead of storing strings, we have a special symbol, which is called [24:20.840 --> 24:25.080] a special type, which is called a symbol, if you give me a string, we convert into a [24:25.080 --> 24:29.360] number and we do look up automatically those things, so we can make a lot of assumptions [24:29.360 --> 24:34.800] because we hyper-specialize on one particular use case, we optimize storage, we don't use [24:34.800 --> 24:40.200] indexes because we store everything always in incremental order per partition, if we [24:40.200 --> 24:45.040] get data out of order, we have to regret the partitions, but we don't need indexes because [24:45.040 --> 24:49.880] we always have the data physically in order, so we can scan super quickly back and forth, [24:49.880 --> 24:55.080] that's kind of the idea, we also parallelize as much as we can using different things, [24:55.080 --> 25:00.200] this is written in Java and it's from scratch, you will see some databases which I love, [25:00.200 --> 25:05.360] like MongoDB, excellent database for content, they have a time-serious module, we use the [25:05.360 --> 25:11.280] same MongoDB collections for doing time-series, they cannot be as fast because they are using [25:11.280 --> 25:16.880] exactly what they are using for content, it's very convenient, I can do everything, but [25:16.880 --> 25:20.840] same thing with other engines that are built on top of other things, we don't have any [25:20.840 --> 25:27.640] dependencies, everything is built for scratch, actually we are writing some of the libraries [25:27.640 --> 25:33.800] in Java like strings and loggers and so on to avoid conversions, there are things that [25:33.800 --> 25:39.280] we don't use, so we don't use them, we have libraries for strings, we have libraries for [25:39.280 --> 25:45.280] memory management, we have libraries for absolutely everything, they are written in our own version, [25:45.280 --> 25:50.280] we had our own Justintine compiler because the original Justintine compiler in Java was [25:50.280 --> 25:55.400] not performed enough for some of the parallelization inquiries wanted to do, so we wrote everything, [25:55.400 --> 26:01.480] our Java is kind of weird, Jeremy can tell you more about that, it's super weird Java, [26:01.480 --> 26:07.840] but it's still Java, that's kind of the idea, we even route our own input output functions, [26:07.840 --> 26:09.280] that's kind of a thing, why? [26:09.280 --> 26:16.400] Because we can get nanoseconds faster, this is log4j, log4j, we don't speak about log4j, [26:16.400 --> 26:24.080] but this is awesome, but you know this is log4j, j for log4j, and this is the nanoseconds, [26:24.080 --> 26:32.680] the operations you can do in each nanosecond, so with log4j, login, integer, you can do [26:32.680 --> 26:40.040] 82 operations per nanosecond, we can do 800 operations per nanosecond, which is, do you [26:40.040 --> 26:45.480] have to go down to the nanosecond, if you are doing a CRUT application, probably not, [26:45.480 --> 26:49.960] it really depends what you are building, that's kind of why we are writing things from scratch, [26:49.960 --> 26:55.200] so basically the approach of QuestDB to performance, you know this, this is like, I don't know [26:55.200 --> 27:00.880] who you are, but I don't know you, but I will find you and I will kill you, that's kind [27:00.880 --> 27:06.800] of the same approach I see on QuestDB team, they are like, I don't know, we can get faster [27:06.800 --> 27:12.240] at some obscure thing here, so that's kind of the idea, and we try to be a good team [27:12.240 --> 27:20.960] player, Jeremy here has contributed himself, only alone, the connectors for KafkaConnet, [27:20.960 --> 27:26.280] connectors for Apache Flink, so we try to integrate with the rest of the ecosystem, we love it [27:26.280 --> 27:32.480] if you try QuestDB, you are open source geeks, you like, we have stars, we like you have [27:32.480 --> 27:38.160] stars, please contribute, please start on GitHub if you like it, we have a contributor [27:38.160 --> 27:44.400] to the Slack channel, we are quite friendly, we are fast, we work with interesting problems, [27:44.400 --> 27:49.600] if you like interesting problems, if you like weird Java, we would love to have you here, [27:49.600 --> 27:52.240] so thank you very much, and I can take any questions outside. [27:52.240 --> 28:10.920] Oh, one question for the chat, thank you, yeah, yeah, yeah, yeah, it's a, someone was [28:10.920 --> 28:16.420] asking, is QuestDB can work with GPS data, yes, you can work with GPS data, we have [28:16.420 --> 28:23.560] doubles that we can use for that, we don't have a lot of geospatial functions, we have [28:23.560 --> 28:30.080] geohashes, which basically allow you to define in which, at different resolutions, in which [28:30.080 --> 28:35.720] square in the world something is, so if you are talking about finding where a point is [28:35.720 --> 28:40.600] in the world, at a particular point in time, QuestDB is very cool, if you need to do other [28:40.600 --> 28:47.400] things, we support some math libraries, calls and all those things to do your own calculations, [28:47.400 --> 28:51.440] but yeah, it can be used for GPS, and some people are, a lot of people are actually doing [28:51.440 --> 29:13.920] asset tracking with QuestDB, thank you.