[00:00.000 --> 00:18.400] It's four o'clock, so let's look at our preview, sorry, now next talk. [00:18.400 --> 00:24.320] I have been doing some mattings in Go, but building a database, I honestly have strong [00:24.320 --> 00:25.640] respect for. [00:25.640 --> 00:32.640] So next up is Etienne, who is going to tell us everything about Crazy Kitchen is in Go. [00:32.640 --> 00:39.880] Thank you, thank you, yeah, welcome to our mad journey of building a database in Go, [00:39.880 --> 00:45.600] and yeah, it's pretty mad to build a database at all, it may be even worse or even a matter [00:45.600 --> 00:53.840] to build a database in Go when most are built in Go. [00:53.840 --> 01:00.720] Let me start over in case you didn't hear it, so hi, my name is Etienne, welcome to [01:00.720 --> 01:03.920] our mad journey of building a vector database in Go. [01:03.920 --> 01:08.120] So building a database at all could already be pretty mad, doing it in Go when most are [01:08.120 --> 01:14.720] built in C or C++ could be even matter or even more exciting, and we definitely encountered [01:14.720 --> 01:19.480] a couple of unique problems that led us to create creative solutions, and there's lots [01:19.480 --> 01:24.440] of shout outs in there and also a couple of wish lists, so we just released Go 1.20, [01:24.440 --> 01:26.800] and of course the occasional madness. [01:26.800 --> 01:31.920] So let's get one question out of the way right away, why does the world even need yet another [01:31.920 --> 01:32.920] database? [01:32.920 --> 01:38.320] There's so many out there already, but probably you've seen this thing called chat GPT, because [01:38.320 --> 01:43.840] that was pretty much everywhere and it's kind of hard to hide from it, and chat GPT is a [01:43.840 --> 01:48.800] large language model and it's really good at putting text together that sounds really [01:48.800 --> 01:56.280] sophisticated and sounds nice and sometimes is completely wrong, and so in this case we're [01:56.280 --> 02:00.880] asking you, is it mad to write a database and go, I might disagree with that, but either [02:00.880 --> 02:04.720] way, basically we're now in a situation where on the one hand we have these machine learning [02:04.720 --> 02:09.000] models that can do all the cool stuff and do this sort of interactively and on the fly, [02:09.000 --> 02:12.560] and on the other side we have traditional databases, and those traditional databases, [02:12.560 --> 02:15.960] they have the fact, because that's kind of what databases are for, right? [02:15.960 --> 02:19.640] So wouldn't it be cool if we could somehow combine those two? [02:19.640 --> 02:25.040] So for example on the query side, if I ask Wikipedia, why can airplanes fly? [02:25.040 --> 02:29.000] Then the kind of passage that I want that has the answer in it is titled the physics [02:29.000 --> 02:33.120] of flight, but that is difficult for a traditional search engine, because if you look at keyword [02:33.120 --> 02:39.000] overlap there's almost none in there, but a vector search engine can use machine learning [02:39.000 --> 02:42.800] models basically that can tell you these two things are the same, and searching through [02:42.800 --> 02:45.680] that at scale is a big problem. [02:45.680 --> 02:51.000] Then there's that sort of chat GPT side where you don't just want to search through it, [02:51.000 --> 02:55.440] but maybe you also want to say like take those results, summarize them, and also translate [02:55.440 --> 02:56.440] them to German. [02:56.440 --> 03:00.560] So basically not just return exactly what's in the database, but do something with it [03:00.560 --> 03:03.560] and basically generate more data from it. [03:03.560 --> 03:07.480] And that is exactly where VV8 comes in, so VV8 is a vector search engine which basically [03:07.480 --> 03:13.320] helps us solve this kind of searching by meaning instead of keywords without sort of losing [03:13.320 --> 03:17.480] what we've done in 20 plus years of search engine research. [03:17.480 --> 03:22.520] And now most recently you can also interact with these models such as chat GPT, GPT3, [03:22.520 --> 03:26.200] and of course also the open source versions of it. [03:26.200 --> 03:28.760] So VV8 is written in go. [03:28.760 --> 03:29.800] Is that a good idea? [03:29.800 --> 03:30.800] Is that a bad idea? [03:30.800 --> 03:34.080] Or have we just gone plain mad? [03:34.080 --> 03:35.920] So we're not alone, that's good. [03:35.920 --> 03:41.560] So you probably recognize these things, they're all bigger brands at the moment than VV8, [03:41.560 --> 03:43.720] so VV8 is growing fast. [03:43.720 --> 03:48.280] And some of those vendors have really great blog posts where you see some of the like [03:48.280 --> 03:51.560] optimization topics and some of the crazy stuff that they have to do. [03:51.560 --> 03:55.240] So if you've contributed to one of those, some of the things I'm going to say might [03:55.240 --> 04:00.920] sound familiar, if not then buckle up, it's going to get mad. [04:00.920 --> 04:05.760] So first stop on our mad journey memory allocation, then that also brings us to our friend the [04:05.760 --> 04:07.080] garbage collector. [04:07.080 --> 04:13.040] So for any high performance go application, sooner or later you're going to talk about [04:13.040 --> 04:16.560] memory allocations and definitely consider a database a high performance application [04:16.560 --> 04:19.800] or at least consider VV8 a high performance application. [04:19.800 --> 04:24.400] And if you think of what databases do, like in essence basically you have something on [04:24.400 --> 04:27.840] disk and you want to serve it to the user, that's like one of the most important user [04:27.840 --> 04:30.080] journeys in a database. [04:30.080 --> 04:34.960] And here this is represented by just a number, so it went for UN32, so that's just four bytes [04:34.960 --> 04:39.040] on disk and basically you can see sort of these four bytes. [04:39.040 --> 04:44.640] If you parse them into Go they would have the value of 16 in that UN32 and this is essentially [04:44.640 --> 04:49.120] something very much simplified that a database needs to do and it needs to do it over and [04:49.120 --> 04:50.960] over again. [04:50.960 --> 04:56.080] So the standard library gives us the encoding slash binary package and there we have this [04:56.080 --> 04:59.840] binary dot read method which I think looks really cool. [04:59.840 --> 05:04.520] To me it looks like idiomatic Go because it has the io dot reader interface like everyone's [05:04.520 --> 05:09.360] favorite interface and you can put all of that stuff in there and if you run this code [05:09.360 --> 05:12.040] and there's no error then basically you get exactly what you want. [05:12.040 --> 05:17.080] You could turn those sort of four bytes that were somewhere on disk, turn them into our [05:17.080 --> 05:20.920] in-memory representation of that UN32. [05:20.920 --> 05:27.480] So is this a good idea to do that exactly like well if you do it once or maybe twice [05:27.480 --> 05:29.160] could be a good idea. [05:29.160 --> 05:32.600] If you do it a billion times this is what happens. [05:32.600 --> 05:38.080] So for those of you who are new to CPU profiles in Go this is madness. [05:38.080 --> 05:39.080] This is pretty bad. [05:39.080 --> 05:44.440] So first of all you see it in the center parsing those 1 billion numbers took 26 seconds and [05:44.440 --> 05:50.160] 26 seconds is not the kind of time that we ever have in the database but worse than that [05:50.160 --> 05:56.080] if you look at that profile we have stuff like runtime, malloc, gc, runtime, mem, move, [05:56.080 --> 05:57.480] runtime, m, advice. [05:57.480 --> 06:01.840] So all these things they're related to memory allocations or to garbage collection. [06:01.840 --> 06:05.800] What they're not related to is parsing data which is what we wanted to do. [06:05.800 --> 06:10.000] So how much time of that 20 seconds did we spend what we wanted to do? [06:10.000 --> 06:11.000] Don't know. [06:11.000 --> 06:13.120] It doesn't even show up in the profile. [06:13.120 --> 06:19.600] So to understand why that is the case we need to quickly talk about the stack and the heap. [06:19.600 --> 06:24.160] So you can think of the stack as basically your function stack so you call one function [06:24.160 --> 06:27.960] that calls another function and then at some point basically you go back through the stack [06:27.960 --> 06:32.520] and this is very short lift and this is cheap and fast to allocate and why is it cheap? [06:32.520 --> 06:37.280] Because you know exactly the runtime of your variables or the life cycle of your variables [06:37.280 --> 06:39.120] so you don't even need to involve the garbage collector. [06:39.120 --> 06:41.480] So no garbage collector cheap and fast. [06:41.480 --> 06:45.920] Then on the other side you have the heap and the heap is basically this sort of long lift [06:45.920 --> 06:51.120] kind of memory and that's expensive and slow to allocate and why because and also to deallocate [06:51.120 --> 06:54.080] and why because it involves the garbage collector. [06:54.080 --> 06:58.040] So if the stack is so much cheaper then we can just always allocate on the stack right. [06:58.040 --> 07:02.120] So warning this is not real go please do not do this. [07:02.120 --> 07:05.840] This is sort of a fictional example of allocating a buffer of size 8 and then we're going to [07:05.840 --> 07:10.040] say like yeah please put this on the stack and that is not how it works and for most [07:10.040 --> 07:13.240] of you you probably say like this is pretty good that it's not that it works that way [07:13.240 --> 07:15.040] because why would you want to deal with that. [07:15.040 --> 07:19.160] But for me just trying to build a database and go sometimes like this something like [07:19.160 --> 07:21.440] this may be good or maybe not. [07:21.440 --> 07:22.880] So how does it work? [07:22.880 --> 07:25.640] Go does something that's called escape analysis. [07:25.640 --> 07:32.320] So if you compile your code with gcflags-m then go annotates your code basically and [07:32.320 --> 07:34.280] tells you sort of what's happening there. [07:34.280 --> 07:40.640] So here you can see in the second line that this num variable that we used was moved to [07:40.640 --> 07:45.720] the heap and then in the next point you see the bytes.reader which represents our io.reader [07:45.720 --> 07:46.920] escaped to the heap. [07:46.920 --> 07:49.920] So two times we see that something happened to the or went to the heap. [07:49.920 --> 07:54.400] We don't exactly know what happened yet but at least there's proof that we have this kind [07:54.400 --> 07:57.000] of allocation problem. [07:57.000 --> 07:58.400] So what can we do? [07:58.400 --> 07:59.920] Well we can simplify a bit. [07:59.920 --> 08:05.240] Turns out that the binary or encoding binary package also has another method that looks [08:05.240 --> 08:10.920] like this which is just called view in 32 on the little endian package and it kind of [08:10.920 --> 08:11.920] does the same thing. [08:11.920 --> 08:15.320] You just put in the buffer on the one side so no reader this time you just put in the [08:15.320 --> 08:19.800] raw buffer basically with the position offset and on the other side you get the number out. [08:19.800 --> 08:24.000] And the crazy thing is this one line needs no memory allocations. [08:24.000 --> 08:30.200] So if we do that again our one billion numbers that took 26 seconds before now take 600 milliseconds [08:30.200 --> 08:35.280] and now we're starting to get into a range where like this is acceptable for a data basis. [08:35.280 --> 08:39.880] And more importantly what we see on that profile, the profile is so much simpler right now. [08:39.880 --> 08:45.600] There's basically just this one function there and that is what we wanted to do. [08:45.600 --> 08:49.160] So admittedly we're not doing much other than parsing the data at the moment but at least [08:49.160 --> 08:54.120] we got sort of rid of all the noise and you can see the speed up. [08:54.120 --> 08:56.560] Okay so quickly to recap. [08:56.560 --> 09:00.000] If we say a database is nothing but reading data and sort of parsing it to serve it to [09:00.000 --> 09:06.160] the user then we do that over and over again then we need to take care of memory allocations. [09:06.160 --> 09:08.600] And the fix in this case was super simple. [09:08.600 --> 09:13.620] We changed two lines of code and reduced it from 26 seconds to 600 milliseconds. [09:13.620 --> 09:17.280] But why we had to do that wasn't very intuitive like that it wasn't very obvious. [09:17.280 --> 09:22.000] In fact I haven't even told you yet why this binary dot little nvn dot read why that escaped [09:22.000 --> 09:23.000] to the heap. [09:23.000 --> 09:26.600] And in this case it's because we passed in a pointer and we passed in an interface and [09:26.600 --> 09:30.600] that's kind of a hint basically that something might escape to the heap. [09:30.600 --> 09:36.160] So what I would wish is yes this is not a topic that you need every day you write go [09:36.160 --> 09:42.160] but maybe if you do need this would be cool if there was better education. [09:42.160 --> 09:48.120] Okay so second step delay the coding so this is kind of the idea that we wouldn't want [09:48.120 --> 09:50.840] to do the same work twice. [09:50.840 --> 09:56.680] And we're sticking with our example of serving data from disk but now while the number example [09:56.680 --> 10:00.520] was a bit too simple so let's make it slightly more complex. [10:00.520 --> 10:08.120] We have this nested array here basically a sort of slice off slice view in 64 and that's [10:08.120 --> 10:12.040] representative now for a more complex object on your database. [10:12.040 --> 10:14.920] Of course in reality you'd have like string props and other kind of things but just sort [10:14.920 --> 10:18.560] of to show that there's more going on than a single number. [10:18.560 --> 10:22.600] And let's say we have 80 million of them so 10 million of the outer slice and then eight [10:22.600 --> 10:25.920] elements in each inner slice and our task is just to sum those up. [10:25.920 --> 10:28.800] So these are 80 million numbers and we want to know what is the sum of them. [10:28.800 --> 10:36.400] So that is actually kind of a realistic database task for an OLAP kind of database. [10:36.400 --> 10:41.080] But we need to somehow represent that data on disk and we're looking at two ways to do [10:41.080 --> 10:42.080] this. [10:42.080 --> 10:45.440] The first one is JSON representation and then the second one would be the sort of binary [10:45.440 --> 10:48.200] encoding and then there'll be more. [10:48.200 --> 10:51.160] So JSON is basically just here for completeness aid. [10:51.160 --> 10:54.560] We can basically rule it out immediately so when you're building a database you're probably [10:54.560 --> 11:00.440] not using JSON to store stuff on disk unless it's sort of a JSON database. [11:00.440 --> 11:05.120] Why because it's space inefficient so if you want to represent those numbers on disk like [11:05.120 --> 11:09.960] JSON basically uses strings for it and then you have all these control characters like [11:09.960 --> 11:14.080] your curly braces and your quotes and your columns and everything that takes up space. [11:14.080 --> 11:18.680] So in our fictional example that would take up 1.6 gigabyte and you'll see soon that we [11:18.680 --> 11:20.160] can do that more efficient. [11:20.160 --> 11:25.120] But also it's slow and part of why it's slow is again because we have these memory allocations [11:25.120 --> 11:27.640] but also the whole parsing just takes time. [11:27.640 --> 11:34.240] So in our example this took 14 seconds to sum up those 80 million numbers and as I said [11:34.240 --> 11:39.840] before you just don't have double digit seconds in a database. [11:39.840 --> 11:44.800] So we can do something that's a bit smarter which is called length encoding. [11:44.800 --> 11:50.760] So we're encoding this basically as binary and we're spending one in this case one byte [11:50.760 --> 11:54.400] so that's basically a U and 8 and we're using that as a length indicator. [11:54.400 --> 11:57.440] So basically that tells us that when we're reading this from disk that just tells us [11:57.440 --> 11:58.440] what's coming up. [11:58.440 --> 12:02.080] So in this case it says we have eight elements coming up and then we know that our elements [12:02.080 --> 12:05.520] in this example is U and 32 so that's four bytes each. [12:05.520 --> 12:09.840] So basically the next 32 bytes that we're reading are going to be our eight inner arrays [12:09.840 --> 12:10.840] and then we just continue. [12:10.840 --> 12:15.200] Then we basically read the next length indicator and this way we can encode the stuff sort [12:15.200 --> 12:19.040] of in one contiguous thing. [12:19.040 --> 12:23.440] Then of course we have to decode it somehow and we can do that because we've learned from [12:23.440 --> 12:27.960] our previous example right so we're not going to use binary.lnlndian.read but we're doing [12:27.960 --> 12:33.760] this in an allocation free way, you can see that in the length line basically and yeah [12:33.760 --> 12:38.760] our goal is to take that data and put it into our nested sort of go slice of slice of slice [12:38.760 --> 12:44.880] of U in 64 and the code here basically you see we're reading the length and then we're [12:44.880 --> 12:48.160] increasing our offset so we know where to read from and then we're basically repeating [12:48.160 --> 12:54.760] this for the inner slice which is just hinted at here by the decode inner function. [12:54.760 --> 12:57.000] So what happens when we do this? [12:57.000 --> 13:01.960] First of all the good news, 660 megabytes that's way less than our 1.6 gigabyte before [13:01.960 --> 13:09.080] so basically just by using a more space efficient way to represent data we've yeah done exactly [13:09.080 --> 13:14.480] that we've reduced our size also it's much much faster so we were at 14 seconds before [13:14.480 --> 13:21.560] and now it's down to 260 milliseconds but this is our mad journey of building a database [13:21.560 --> 13:26.480] so we're not done here yet because there's some hidden madness and the hidden madness [13:26.480 --> 13:32.480] is that we actually spend 250 milliseconds decoding while we spend 10 milliseconds summing [13:32.480 --> 13:36.160] up those 80 million numbers so again we're kind of in that situation where we're doing [13:36.160 --> 13:39.400] something that we never really set out to do like we wanted to do something else but [13:39.400 --> 13:45.040] we're spending our time on yeah doing something that we didn't want to do so where does that [13:45.040 --> 13:50.720] come from and the first problem is basically that what we did what we set out to do was [13:50.720 --> 13:54.880] fought from the get go because we said we want to decode so we're basically thinking [13:54.880 --> 13:59.320] in the same way that we're thinking as we were with Jason we said that we want to decode [13:59.320 --> 14:04.160] this entire thing into this go data structure but that means that you see we need to allocate [14:04.160 --> 14:08.080] this massive slice again and that also means that we need to in each inner slice we also [14:08.080 --> 14:11.880] need to allocate again so we're basically allocating and allocating over and over again [14:11.880 --> 14:17.400] where our task is not to allocate our task was to sum up numbers so we can actually just [14:17.400 --> 14:22.400] simplify this a bit and we can basically just not decode it like while we're looping over [14:22.400 --> 14:26.440] that data anyway instead of storing it in an array we can just do with it what we plan [14:26.440 --> 14:31.640] to do and in this case this would be summing up the data so basically getting rid of that [14:31.640 --> 14:39.120] decoding step helps us to make this way faster so now we're at 46 milliseconds of course our [14:39.120 --> 14:42.920] footprint of the data on disk hasn't changed because it's the same data that we're reading [14:42.920 --> 14:47.840] we're just reading it in a slightly more efficient way but yeah we don't have to allocate slices [14:47.840 --> 14:50.840] and also because we don't have these like nested slices we don't have like slices that [14:50.840 --> 14:55.240] basically have pointers to other slices so we have better memory locality and now we're [14:55.240 --> 15:01.160] at 46 milliseconds and that is that is cool so 46 milliseconds is basically the time frame [15:01.160 --> 15:08.040] that can be acceptable for a database okay so quickly in recap we immediately ruled out [15:08.040 --> 15:11.480] JSON because it just wasn't space efficient and we knew that we needed something more [15:11.480 --> 15:16.720] space efficient and also way faster binary encoding already made it much faster which [15:16.720 --> 15:22.320] is great but if we decode it upfront then yeah we still lost a lot of time and it can [15:22.320 --> 15:26.400] be worth it in these kind of high-performance situations if you either sort of delay the [15:26.400 --> 15:30.880] decoding as late as possible until you really need it or just don't do it at all or do it [15:30.880 --> 15:35.040] in sort of small parts where we need it no wish list here but an honor we mentioned [15:35.040 --> 15:40.520] so go 1.20 they've actually removed it from the from the release notes because it's so [15:40.520 --> 15:45.840] experimental but go 1.20 has support for memory arenas the idea for memory arenas is basically [15:45.840 --> 15:50.480] that you can bypass the garbage collector and sort of manually free that data so if [15:50.480 --> 15:54.400] you have something that you know has the same sort of life cycle then you can say okay put [15:54.400 --> 15:59.280] it in the arena and basically in the end free the entire arena which would sort of bypass [15:59.280 --> 16:03.840] the garbage collector so that could also be a solution in this case if that ever makes [16:03.840 --> 16:07.360] it like right now it's super experimental and they basically tell you we might just [16:07.360 --> 16:14.320] remove it so don't use it third stop is something that when I first heard it almost sounded [16:14.320 --> 16:19.120] like too good to be true so something called SIMD we'll get to what that is in a second [16:19.120 --> 16:24.560] but first question to the audience who here remembers this thing raise your hands okay [16:24.560 --> 16:30.960] cool so you're just as old as I am so this is the Intel Pentium 2 processor and this [16:30.960 --> 16:37.360] came out in late 90s I think 1997 and was sold for a couple of couple of years and back then [16:37.360 --> 16:41.840] I did not build databases definitely not in go because that also didn't exist yet but [16:41.840 --> 16:46.320] what I would do was sort of try to play 3d video games and I would urge my parents to [16:46.320 --> 16:50.320] get one of those new computers with an Intel Pentium 2 processor and one of the arguments [16:50.320 --> 16:55.360] that I could have used in that discussion was hey it comes with MMX technology and of [16:55.360 --> 17:00.320] course I had no idea what that is and it probably took me 10 or so more more years to find out [17:00.320 --> 17:04.880] what MMX is but it's the first in a long list of SIMD instructions I haven't explained what [17:04.880 --> 17:09.920] SIMD is yet but I will in a second some of those especially the one in the in the top [17:09.920 --> 17:15.440] line they're not really used anymore these days but the the bottom line like AVX2 and AVX512 [17:15.440 --> 17:20.080] you may have heard them in in fact for for many open source project they sometimes just sort of [17:20.080 --> 17:25.840] slap that label in the read me like yeah yeah has AVX2 optimizations and that kind of signals you [17:25.840 --> 17:30.000] yeah we care about speed because it's like low level optimized and VVA does the exact same thing [17:30.000 --> 17:36.480] by the way so to understand how we could make use of that I quickly need to talk about vector [17:36.480 --> 17:42.160] embeddings because I said before that VVA doesn't doesn't search through data by keywords but rather [17:42.160 --> 17:48.000] through its meaning and it uses vector embeddings as a tool for that so this is basically just a [17:48.000 --> 17:52.720] long list of numbers in this case floats and then a machine learning model comes in and basically it [17:52.720 --> 17:57.200] says do something with my input and then you get this vector out and if you do this on all the [17:57.200 --> 18:01.920] objects then you can compare your vectors so you basically can do a vector similarity comparison [18:01.920 --> 18:06.480] and that tells you if something is close to to one another or not so for example the query and the [18:06.480 --> 18:13.840] the object that we had before so without any simd we can use something called the dot product the [18:13.840 --> 18:19.680] dot product is a simple calculation where basically you use you multiply each element of the first [18:19.680 --> 18:24.320] vector with the same corresponding element of the second vector and then you just sum up all of [18:24.320 --> 18:30.240] those elements and we can think of this like multiplication and summing as two instructions [18:30.240 --> 18:34.960] so if we look out first shout out here to the compiler explorer which is a super cool tool to [18:34.960 --> 18:40.400] see like what your go code compiles to we can see that this indeed turns into two instructions so [18:40.400 --> 18:44.480] this is a bit of a lie because there's more stuff going on because it's in a loop etc but let's just [18:44.480 --> 18:52.000] pretend that indeed we have these two instructions to multiply it and to add it so how could we [18:52.000 --> 18:57.120] possibly optimize this even further if we're already at such a low level well we can because [18:57.120 --> 19:03.120] this is our mad journey so all we have to do is introduce some madness and what we're doing now [19:03.760 --> 19:10.080] is a practice that's called unrolling so the idea here is that instead of looping over one [19:10.080 --> 19:14.960] element at a time we're now looping over eight elements at a time but we've got we've gained [19:14.960 --> 19:19.040] nothing like this is we're still doing the same kind of work like we're doing 16 instructions now [19:19.040 --> 19:23.600] in a single loop and we're just doing fewer iterations so by this point nothing gained but [19:23.600 --> 19:29.600] why would we do that well here comes the part where I thought it was too good to be true what if [19:29.600 --> 19:36.960] we could do those 16 operations for the cost of just two instructions sounds crazy right well no [19:36.960 --> 19:42.160] because simd I'm finally revealing what the acronym stands for it stands for single instruction [19:42.160 --> 19:47.520] multiple data and that is exactly what we're doing here so we want to do the same thing over and over [19:47.520 --> 19:53.440] again which is multiplication and then additions and this is exactly what these simd instructions [19:53.440 --> 19:58.080] provide so in this case we can multiply eight floats with other eight floats and then we can [19:58.080 --> 20:06.480] add them up so all this perfect here maybe not because there's a catch of course it's our mad [20:06.480 --> 20:14.320] journey how do you tell go to use these avx two instructions you don't you write assembly code [20:14.320 --> 20:19.280] because go has no way to do that directly the good part is that assembly code integrates really [20:19.280 --> 20:24.400] nicely into go and in the in the standard library it's used over and over again so it's kind of a [20:24.400 --> 20:29.920] standard practice and there is tooling here so shout out to avo really cool too that helps you [20:30.800 --> 20:35.360] basically you're you're still writing assembly with with avo but you're writing it in go and then [20:35.360 --> 20:39.120] it generates the assembly so you still need to know what you're doing but it's like it it [20:39.120 --> 20:47.520] protects you a bit so it definitely helped us a lot so simd recap using avx instructions or [20:47.520 --> 20:53.200] other simd instructions you can basically trick your cpu into doing more work for free but you [20:53.200 --> 20:59.200] need to sort of also trick go to use assembly and with this tooling such as avo it can be better [20:59.200 --> 21:03.840] but it would be even nicer if the language had some sort of support for it and you made my saying [21:03.840 --> 21:07.680] now okay this is this mad guy on stage that wants to build a database but no one else does [21:07.680 --> 21:12.960] needs that but we have this issue here that was open recently and unfortunately also closed recently [21:12.960 --> 21:18.320] because no consensus could be reached but it comes up back and back basically that go users are [21:18.320 --> 21:23.040] saying like hey we want something in the language such as intrinsic so intrinsics are basically the [21:23.040 --> 21:29.120] idea of having high level language instructions to do these these sort of avx or simd instructions [21:29.120 --> 21:35.360] and c or c++ has that for example one way to do that and maybe you're wondering like okay if you [21:35.360 --> 21:40.640] have such a performance hot path like why don't you just write that in c and you see go or write it [21:40.640 --> 21:46.160] in rust or something like that sounds good in theory but the problem is that the call overhead [21:46.160 --> 21:52.560] to call c or c++ is so high that you actually have to outsource quite a bit of your code for that to [21:52.560 --> 21:57.600] to pay off again so if you do that you basically end up writing more and more and more in that [21:57.600 --> 22:02.160] language and then you're not writing go anymore so fortunately that's not or it can be in some [22:02.160 --> 22:09.120] ways but it's not always a great idea so demo time um this was going to be a live demo and [22:09.120 --> 22:14.320] maybe it still is because i prepared this running nicely in a docker container and then my docker [22:14.320 --> 22:19.280] network just broke everything and it didn't work but i just rebuilt it without docker and i think [22:19.280 --> 22:24.880] it might work if not i have screenshots basically that um that do a backup so example query here [22:24.880 --> 22:31.200] i'm a big wine nerd so what i did is i put wine reviews into vv8 and i want to search them now [22:31.200 --> 22:36.240] and one way to do it to show you basically that the keyword um that you don't need a keyword [22:36.240 --> 22:43.360] match but can search by meaning is for example if i go for an affordable italian wine let's see if [22:43.360 --> 22:51.120] the internet connection works it does so what we got back um is basically this this wine review [22:51.120 --> 22:56.960] that i wrote about a barolo that i recently drank and you can see it doesn't say italy [22:56.960 --> 23:02.000] anywhere it doesn't say affordable what it says like without breaking the bank so this is a vector [23:02.000 --> 23:07.760] search that basically happened in the in the background we can take this one step further by [23:07.760 --> 23:12.960] using the generative side so this is basically the the chat gpt part um we can now ask our [23:12.960 --> 23:18.400] database based on the review which is what i wrote when is this wine going to be ready to drink so [23:18.400 --> 23:22.000] let's see you saw before that was the fail query when the internet didn't work now now it's actually [23:22.000 --> 23:27.280] working so that's nice um and here in this case you can see that so this is using open ai but you [23:27.280 --> 23:32.400] can plug in other tools can plug in open source versions of it um this is using open ai because [23:32.400 --> 23:36.080] that's nice to be hosted at a at a service i don't have to run the machine learning model on my [23:36.080 --> 23:40.400] laptop then you can see it tells you the wine is not ready to drink yet we will need at least five [23:40.400 --> 23:44.320] more years which is sort of a good summary of this and then you can see another wine is ready [23:44.320 --> 23:49.520] to drink right now it's in the perfect drinking window so for the final demo let's combine those [23:49.520 --> 23:56.160] two let's do a semantic search to identify something and then do an ai generation basically [23:56.160 --> 24:01.280] so in this case we're saying find me an aged classic riesling best best wine in the world [24:01.280 --> 24:07.200] riesling um and based on the review would you consider this wine to be a fruit bomb so let's [24:07.200 --> 24:13.040] have sort of an opinion from the machine learning model in it and um here we got one of my favorite [24:13.040 --> 24:18.400] wines and the the model says no i would not consider this a fruit bomb while it does have [24:18.400 --> 24:22.720] some fruity notes it is balanced by the mineralogy and acidity which keeps it from being overly sweet [24:22.720 --> 24:27.280] or fruity which is um if you read the text like this is nowhere in there so this is kind of cool [24:27.280 --> 24:33.920] that the that the model was was able to do this okay so let's go back now it's the the demo time [24:33.920 --> 24:38.720] by the way have a github repo with like this example so you can run it yourself and um and [24:39.360 --> 24:47.120] yeah try it out yourself so this was our mad journey and are we mad at go are we mad to do this [24:47.120 --> 24:51.680] well i would pretty much say no because yes there were a couple of parts where we have to give [24:51.680 --> 24:58.960] a get really creative and had to do some some yeah rather unique stuff but that was also basically [24:58.960 --> 25:02.480] like the highlight reel of building a database and all the other parts like i didn't even show the [25:02.480 --> 25:08.000] parts that went great like concurrency handling and the powerful standard library and of course [25:08.000 --> 25:13.120] all of you basically representing the gopher community which is super helpful and yeah this [25:13.120 --> 25:18.880] was my way to basically give back to all of you so if you ever want to build a database or run into [25:18.880 --> 25:23.680] other kind of high performance problems then maybe some of those