[00:00.000 --> 00:18.400]  It's four o'clock, so let's look at our preview, sorry, now next talk.
[00:18.400 --> 00:24.320]  I have been doing some mattings in Go, but building a database, I honestly have strong
[00:24.320 --> 00:25.640]  respect for.
[00:25.640 --> 00:32.640]  So next up is Etienne, who is going to tell us everything about Crazy Kitchen is in Go.
[00:32.640 --> 00:39.880]  Thank you, thank you, yeah, welcome to our mad journey of building a database in Go,
[00:39.880 --> 00:45.600]  and yeah, it's pretty mad to build a database at all, it may be even worse or even a matter
[00:45.600 --> 00:53.840]  to build a database in Go when most are built in Go.
[00:53.840 --> 01:00.720]  Let me start over in case you didn't hear it, so hi, my name is Etienne, welcome to
[01:00.720 --> 01:03.920]  our mad journey of building a vector database in Go.
[01:03.920 --> 01:08.120]  So building a database at all could already be pretty mad, doing it in Go when most are
[01:08.120 --> 01:14.720]  built in C or C++ could be even matter or even more exciting, and we definitely encountered
[01:14.720 --> 01:19.480]  a couple of unique problems that led us to create creative solutions, and there's lots
[01:19.480 --> 01:24.440]  of shout outs in there and also a couple of wish lists, so we just released Go 1.20,
[01:24.440 --> 01:26.800]  and of course the occasional madness.
[01:26.800 --> 01:31.920]  So let's get one question out of the way right away, why does the world even need yet another
[01:31.920 --> 01:32.920]  database?
[01:32.920 --> 01:38.320]  There's so many out there already, but probably you've seen this thing called chat GPT, because
[01:38.320 --> 01:43.840]  that was pretty much everywhere and it's kind of hard to hide from it, and chat GPT is a
[01:43.840 --> 01:48.800]  large language model and it's really good at putting text together that sounds really
[01:48.800 --> 01:56.280]  sophisticated and sounds nice and sometimes is completely wrong, and so in this case we're
[01:56.280 --> 02:00.880]  asking you, is it mad to write a database and go, I might disagree with that, but either
[02:00.880 --> 02:04.720]  way, basically we're now in a situation where on the one hand we have these machine learning
[02:04.720 --> 02:09.000]  models that can do all the cool stuff and do this sort of interactively and on the fly,
[02:09.000 --> 02:12.560]  and on the other side we have traditional databases, and those traditional databases,
[02:12.560 --> 02:15.960]  they have the fact, because that's kind of what databases are for, right?
[02:15.960 --> 02:19.640]  So wouldn't it be cool if we could somehow combine those two?
[02:19.640 --> 02:25.040]  So for example on the query side, if I ask Wikipedia, why can airplanes fly?
[02:25.040 --> 02:29.000]  Then the kind of passage that I want that has the answer in it is titled the physics
[02:29.000 --> 02:33.120]  of flight, but that is difficult for a traditional search engine, because if you look at keyword
[02:33.120 --> 02:39.000]  overlap there's almost none in there, but a vector search engine can use machine learning
[02:39.000 --> 02:42.800]  models basically that can tell you these two things are the same, and searching through
[02:42.800 --> 02:45.680]  that at scale is a big problem.
[02:45.680 --> 02:51.000]  Then there's that sort of chat GPT side where you don't just want to search through it,
[02:51.000 --> 02:55.440]  but maybe you also want to say like take those results, summarize them, and also translate
[02:55.440 --> 02:56.440]  them to German.
[02:56.440 --> 03:00.560]  So basically not just return exactly what's in the database, but do something with it
[03:00.560 --> 03:03.560]  and basically generate more data from it.
[03:03.560 --> 03:07.480]  And that is exactly where VV8 comes in, so VV8 is a vector search engine which basically
[03:07.480 --> 03:13.320]  helps us solve this kind of searching by meaning instead of keywords without sort of losing
[03:13.320 --> 03:17.480]  what we've done in 20 plus years of search engine research.
[03:17.480 --> 03:22.520]  And now most recently you can also interact with these models such as chat GPT, GPT3,
[03:22.520 --> 03:26.200]  and of course also the open source versions of it.
[03:26.200 --> 03:28.760]  So VV8 is written in go.
[03:28.760 --> 03:29.800]  Is that a good idea?
[03:29.800 --> 03:30.800]  Is that a bad idea?
[03:30.800 --> 03:34.080]  Or have we just gone plain mad?
[03:34.080 --> 03:35.920]  So we're not alone, that's good.
[03:35.920 --> 03:41.560]  So you probably recognize these things, they're all bigger brands at the moment than VV8,
[03:41.560 --> 03:43.720]  so VV8 is growing fast.
[03:43.720 --> 03:48.280]  And some of those vendors have really great blog posts where you see some of the like
[03:48.280 --> 03:51.560]  optimization topics and some of the crazy stuff that they have to do.
[03:51.560 --> 03:55.240]  So if you've contributed to one of those, some of the things I'm going to say might
[03:55.240 --> 04:00.920]  sound familiar, if not then buckle up, it's going to get mad.
[04:00.920 --> 04:05.760]  So first stop on our mad journey memory allocation, then that also brings us to our friend the
[04:05.760 --> 04:07.080]  garbage collector.
[04:07.080 --> 04:13.040]  So for any high performance go application, sooner or later you're going to talk about
[04:13.040 --> 04:16.560]  memory allocations and definitely consider a database a high performance application
[04:16.560 --> 04:19.800]  or at least consider VV8 a high performance application.
[04:19.800 --> 04:24.400]  And if you think of what databases do, like in essence basically you have something on
[04:24.400 --> 04:27.840]  disk and you want to serve it to the user, that's like one of the most important user
[04:27.840 --> 04:30.080]  journeys in a database.
[04:30.080 --> 04:34.960]  And here this is represented by just a number, so it went for UN32, so that's just four bytes
[04:34.960 --> 04:39.040]  on disk and basically you can see sort of these four bytes.
[04:39.040 --> 04:44.640]  If you parse them into Go they would have the value of 16 in that UN32 and this is essentially
[04:44.640 --> 04:49.120]  something very much simplified that a database needs to do and it needs to do it over and
[04:49.120 --> 04:50.960]  over again.
[04:50.960 --> 04:56.080]  So the standard library gives us the encoding slash binary package and there we have this
[04:56.080 --> 04:59.840]  binary dot read method which I think looks really cool.
[04:59.840 --> 05:04.520]  To me it looks like idiomatic Go because it has the io dot reader interface like everyone's
[05:04.520 --> 05:09.360]  favorite interface and you can put all of that stuff in there and if you run this code
[05:09.360 --> 05:12.040]  and there's no error then basically you get exactly what you want.
[05:12.040 --> 05:17.080]  You could turn those sort of four bytes that were somewhere on disk, turn them into our
[05:17.080 --> 05:20.920]  in-memory representation of that UN32.
[05:20.920 --> 05:27.480]  So is this a good idea to do that exactly like well if you do it once or maybe twice
[05:27.480 --> 05:29.160]  could be a good idea.
[05:29.160 --> 05:32.600]  If you do it a billion times this is what happens.
[05:32.600 --> 05:38.080]  So for those of you who are new to CPU profiles in Go this is madness.
[05:38.080 --> 05:39.080]  This is pretty bad.
[05:39.080 --> 05:44.440]  So first of all you see it in the center parsing those 1 billion numbers took 26 seconds and
[05:44.440 --> 05:50.160]  26 seconds is not the kind of time that we ever have in the database but worse than that
[05:50.160 --> 05:56.080]  if you look at that profile we have stuff like runtime, malloc, gc, runtime, mem, move,
[05:56.080 --> 05:57.480]  runtime, m, advice.
[05:57.480 --> 06:01.840]  So all these things they're related to memory allocations or to garbage collection.
[06:01.840 --> 06:05.800]  What they're not related to is parsing data which is what we wanted to do.
[06:05.800 --> 06:10.000]  So how much time of that 20 seconds did we spend what we wanted to do?
[06:10.000 --> 06:11.000]  Don't know.
[06:11.000 --> 06:13.120]  It doesn't even show up in the profile.
[06:13.120 --> 06:19.600]  So to understand why that is the case we need to quickly talk about the stack and the heap.
[06:19.600 --> 06:24.160]  So you can think of the stack as basically your function stack so you call one function
[06:24.160 --> 06:27.960]  that calls another function and then at some point basically you go back through the stack
[06:27.960 --> 06:32.520]  and this is very short lift and this is cheap and fast to allocate and why is it cheap?
[06:32.520 --> 06:37.280]  Because you know exactly the runtime of your variables or the life cycle of your variables
[06:37.280 --> 06:39.120]  so you don't even need to involve the garbage collector.
[06:39.120 --> 06:41.480]  So no garbage collector cheap and fast.
[06:41.480 --> 06:45.920]  Then on the other side you have the heap and the heap is basically this sort of long lift
[06:45.920 --> 06:51.120]  kind of memory and that's expensive and slow to allocate and why because and also to deallocate
[06:51.120 --> 06:54.080]  and why because it involves the garbage collector.
[06:54.080 --> 06:58.040]  So if the stack is so much cheaper then we can just always allocate on the stack right.
[06:58.040 --> 07:02.120]  So warning this is not real go please do not do this.
[07:02.120 --> 07:05.840]  This is sort of a fictional example of allocating a buffer of size 8 and then we're going to
[07:05.840 --> 07:10.040]  say like yeah please put this on the stack and that is not how it works and for most
[07:10.040 --> 07:13.240]  of you you probably say like this is pretty good that it's not that it works that way
[07:13.240 --> 07:15.040]  because why would you want to deal with that.
[07:15.040 --> 07:19.160]  But for me just trying to build a database and go sometimes like this something like
[07:19.160 --> 07:21.440]  this may be good or maybe not.
[07:21.440 --> 07:22.880]  So how does it work?
[07:22.880 --> 07:25.640]  Go does something that's called escape analysis.
[07:25.640 --> 07:32.320]  So if you compile your code with gcflags-m then go annotates your code basically and
[07:32.320 --> 07:34.280]  tells you sort of what's happening there.
[07:34.280 --> 07:40.640]  So here you can see in the second line that this num variable that we used was moved to
[07:40.640 --> 07:45.720]  the heap and then in the next point you see the bytes.reader which represents our io.reader
[07:45.720 --> 07:46.920]  escaped to the heap.
[07:46.920 --> 07:49.920]  So two times we see that something happened to the or went to the heap.
[07:49.920 --> 07:54.400]  We don't exactly know what happened yet but at least there's proof that we have this kind
[07:54.400 --> 07:57.000]  of allocation problem.
[07:57.000 --> 07:58.400]  So what can we do?
[07:58.400 --> 07:59.920]  Well we can simplify a bit.
[07:59.920 --> 08:05.240]  Turns out that the binary or encoding binary package also has another method that looks
[08:05.240 --> 08:10.920]  like this which is just called view in 32 on the little endian package and it kind of
[08:10.920 --> 08:11.920]  does the same thing.
[08:11.920 --> 08:15.320]  You just put in the buffer on the one side so no reader this time you just put in the
[08:15.320 --> 08:19.800]  raw buffer basically with the position offset and on the other side you get the number out.
[08:19.800 --> 08:24.000]  And the crazy thing is this one line needs no memory allocations.
[08:24.000 --> 08:30.200]  So if we do that again our one billion numbers that took 26 seconds before now take 600 milliseconds
[08:30.200 --> 08:35.280]  and now we're starting to get into a range where like this is acceptable for a data basis.
[08:35.280 --> 08:39.880]  And more importantly what we see on that profile, the profile is so much simpler right now.
[08:39.880 --> 08:45.600]  There's basically just this one function there and that is what we wanted to do.
[08:45.600 --> 08:49.160]  So admittedly we're not doing much other than parsing the data at the moment but at least
[08:49.160 --> 08:54.120]  we got sort of rid of all the noise and you can see the speed up.
[08:54.120 --> 08:56.560]  Okay so quickly to recap.
[08:56.560 --> 09:00.000]  If we say a database is nothing but reading data and sort of parsing it to serve it to
[09:00.000 --> 09:06.160]  the user then we do that over and over again then we need to take care of memory allocations.
[09:06.160 --> 09:08.600]  And the fix in this case was super simple.
[09:08.600 --> 09:13.620]  We changed two lines of code and reduced it from 26 seconds to 600 milliseconds.
[09:13.620 --> 09:17.280]  But why we had to do that wasn't very intuitive like that it wasn't very obvious.
[09:17.280 --> 09:22.000]  In fact I haven't even told you yet why this binary dot little nvn dot read why that escaped
[09:22.000 --> 09:23.000]  to the heap.
[09:23.000 --> 09:26.600]  And in this case it's because we passed in a pointer and we passed in an interface and
[09:26.600 --> 09:30.600]  that's kind of a hint basically that something might escape to the heap.
[09:30.600 --> 09:36.160]  So what I would wish is yes this is not a topic that you need every day you write go
[09:36.160 --> 09:42.160]  but maybe if you do need this would be cool if there was better education.
[09:42.160 --> 09:48.120]  Okay so second step delay the coding so this is kind of the idea that we wouldn't want
[09:48.120 --> 09:50.840]  to do the same work twice.
[09:50.840 --> 09:56.680]  And we're sticking with our example of serving data from disk but now while the number example
[09:56.680 --> 10:00.520]  was a bit too simple so let's make it slightly more complex.
[10:00.520 --> 10:08.120]  We have this nested array here basically a sort of slice off slice view in 64 and that's
[10:08.120 --> 10:12.040]  representative now for a more complex object on your database.
[10:12.040 --> 10:14.920]  Of course in reality you'd have like string props and other kind of things but just sort
[10:14.920 --> 10:18.560]  of to show that there's more going on than a single number.
[10:18.560 --> 10:22.600]  And let's say we have 80 million of them so 10 million of the outer slice and then eight
[10:22.600 --> 10:25.920]  elements in each inner slice and our task is just to sum those up.
[10:25.920 --> 10:28.800]  So these are 80 million numbers and we want to know what is the sum of them.
[10:28.800 --> 10:36.400]  So that is actually kind of a realistic database task for an OLAP kind of database.
[10:36.400 --> 10:41.080]  But we need to somehow represent that data on disk and we're looking at two ways to do
[10:41.080 --> 10:42.080]  this.
[10:42.080 --> 10:45.440]  The first one is JSON representation and then the second one would be the sort of binary
[10:45.440 --> 10:48.200]  encoding and then there'll be more.
[10:48.200 --> 10:51.160]  So JSON is basically just here for completeness aid.
[10:51.160 --> 10:54.560]  We can basically rule it out immediately so when you're building a database you're probably
[10:54.560 --> 11:00.440]  not using JSON to store stuff on disk unless it's sort of a JSON database.
[11:00.440 --> 11:05.120]  Why because it's space inefficient so if you want to represent those numbers on disk like
[11:05.120 --> 11:09.960]  JSON basically uses strings for it and then you have all these control characters like
[11:09.960 --> 11:14.080]  your curly braces and your quotes and your columns and everything that takes up space.
[11:14.080 --> 11:18.680]  So in our fictional example that would take up 1.6 gigabyte and you'll see soon that we
[11:18.680 --> 11:20.160]  can do that more efficient.
[11:20.160 --> 11:25.120]  But also it's slow and part of why it's slow is again because we have these memory allocations
[11:25.120 --> 11:27.640]  but also the whole parsing just takes time.
[11:27.640 --> 11:34.240]  So in our example this took 14 seconds to sum up those 80 million numbers and as I said
[11:34.240 --> 11:39.840]  before you just don't have double digit seconds in a database.
[11:39.840 --> 11:44.800]  So we can do something that's a bit smarter which is called length encoding.
[11:44.800 --> 11:50.760]  So we're encoding this basically as binary and we're spending one in this case one byte
[11:50.760 --> 11:54.400]  so that's basically a U and 8 and we're using that as a length indicator.
[11:54.400 --> 11:57.440]  So basically that tells us that when we're reading this from disk that just tells us
[11:57.440 --> 11:58.440]  what's coming up.
[11:58.440 --> 12:02.080]  So in this case it says we have eight elements coming up and then we know that our elements
[12:02.080 --> 12:05.520]  in this example is U and 32 so that's four bytes each.
[12:05.520 --> 12:09.840]  So basically the next 32 bytes that we're reading are going to be our eight inner arrays
[12:09.840 --> 12:10.840]  and then we just continue.
[12:10.840 --> 12:15.200]  Then we basically read the next length indicator and this way we can encode the stuff sort
[12:15.200 --> 12:19.040]  of in one contiguous thing.
[12:19.040 --> 12:23.440]  Then of course we have to decode it somehow and we can do that because we've learned from
[12:23.440 --> 12:27.960]  our previous example right so we're not going to use binary.lnlndian.read but we're doing
[12:27.960 --> 12:33.760]  this in an allocation free way, you can see that in the length line basically and yeah
[12:33.760 --> 12:38.760]  our goal is to take that data and put it into our nested sort of go slice of slice of slice
[12:38.760 --> 12:44.880]  of U in 64 and the code here basically you see we're reading the length and then we're
[12:44.880 --> 12:48.160]  increasing our offset so we know where to read from and then we're basically repeating
[12:48.160 --> 12:54.760]  this for the inner slice which is just hinted at here by the decode inner function.
[12:54.760 --> 12:57.000]  So what happens when we do this?
[12:57.000 --> 13:01.960]  First of all the good news, 660 megabytes that's way less than our 1.6 gigabyte before
[13:01.960 --> 13:09.080]  so basically just by using a more space efficient way to represent data we've yeah done exactly
[13:09.080 --> 13:14.480]  that we've reduced our size also it's much much faster so we were at 14 seconds before
[13:14.480 --> 13:21.560]  and now it's down to 260 milliseconds but this is our mad journey of building a database
[13:21.560 --> 13:26.480]  so we're not done here yet because there's some hidden madness and the hidden madness
[13:26.480 --> 13:32.480]  is that we actually spend 250 milliseconds decoding while we spend 10 milliseconds summing
[13:32.480 --> 13:36.160]  up those 80 million numbers so again we're kind of in that situation where we're doing
[13:36.160 --> 13:39.400]  something that we never really set out to do like we wanted to do something else but
[13:39.400 --> 13:45.040]  we're spending our time on yeah doing something that we didn't want to do so where does that
[13:45.040 --> 13:50.720]  come from and the first problem is basically that what we did what we set out to do was
[13:50.720 --> 13:54.880]  fought from the get go because we said we want to decode so we're basically thinking
[13:54.880 --> 13:59.320]  in the same way that we're thinking as we were with Jason we said that we want to decode
[13:59.320 --> 14:04.160]  this entire thing into this go data structure but that means that you see we need to allocate
[14:04.160 --> 14:08.080]  this massive slice again and that also means that we need to in each inner slice we also
[14:08.080 --> 14:11.880]  need to allocate again so we're basically allocating and allocating over and over again
[14:11.880 --> 14:17.400]  where our task is not to allocate our task was to sum up numbers so we can actually just
[14:17.400 --> 14:22.400]  simplify this a bit and we can basically just not decode it like while we're looping over
[14:22.400 --> 14:26.440]  that data anyway instead of storing it in an array we can just do with it what we plan
[14:26.440 --> 14:31.640]  to do and in this case this would be summing up the data so basically getting rid of that
[14:31.640 --> 14:39.120]  decoding step helps us to make this way faster so now we're at 46 milliseconds of course our
[14:39.120 --> 14:42.920]  footprint of the data on disk hasn't changed because it's the same data that we're reading
[14:42.920 --> 14:47.840]  we're just reading it in a slightly more efficient way but yeah we don't have to allocate slices
[14:47.840 --> 14:50.840]  and also because we don't have these like nested slices we don't have like slices that
[14:50.840 --> 14:55.240]  basically have pointers to other slices so we have better memory locality and now we're
[14:55.240 --> 15:01.160]  at 46 milliseconds and that is that is cool so 46 milliseconds is basically the time frame
[15:01.160 --> 15:08.040]  that can be acceptable for a database okay so quickly in recap we immediately ruled out
[15:08.040 --> 15:11.480]  JSON because it just wasn't space efficient and we knew that we needed something more
[15:11.480 --> 15:16.720]  space efficient and also way faster binary encoding already made it much faster which
[15:16.720 --> 15:22.320]  is great but if we decode it upfront then yeah we still lost a lot of time and it can
[15:22.320 --> 15:26.400]  be worth it in these kind of high-performance situations if you either sort of delay the
[15:26.400 --> 15:30.880]  decoding as late as possible until you really need it or just don't do it at all or do it
[15:30.880 --> 15:35.040]  in sort of small parts where we need it no wish list here but an honor we mentioned
[15:35.040 --> 15:40.520]  so go 1.20 they've actually removed it from the from the release notes because it's so
[15:40.520 --> 15:45.840]  experimental but go 1.20 has support for memory arenas the idea for memory arenas is basically
[15:45.840 --> 15:50.480]  that you can bypass the garbage collector and sort of manually free that data so if
[15:50.480 --> 15:54.400]  you have something that you know has the same sort of life cycle then you can say okay put
[15:54.400 --> 15:59.280]  it in the arena and basically in the end free the entire arena which would sort of bypass
[15:59.280 --> 16:03.840]  the garbage collector so that could also be a solution in this case if that ever makes
[16:03.840 --> 16:07.360]  it like right now it's super experimental and they basically tell you we might just
[16:07.360 --> 16:14.320]  remove it so don't use it third stop is something that when I first heard it almost sounded
[16:14.320 --> 16:19.120]  like too good to be true so something called SIMD we'll get to what that is in a second
[16:19.120 --> 16:24.560]  but first question to the audience who here remembers this thing raise your hands okay
[16:24.560 --> 16:30.960]  cool so you're just as old as I am so this is the Intel Pentium 2 processor and this
[16:30.960 --> 16:37.360]  came out in late 90s I think 1997 and was sold for a couple of couple of years and back then
[16:37.360 --> 16:41.840]  I did not build databases definitely not in go because that also didn't exist yet but
[16:41.840 --> 16:46.320]  what I would do was sort of try to play 3d video games and I would urge my parents to
[16:46.320 --> 16:50.320]  get one of those new computers with an Intel Pentium 2 processor and one of the arguments
[16:50.320 --> 16:55.360]  that I could have used in that discussion was hey it comes with MMX technology and of
[16:55.360 --> 17:00.320]  course I had no idea what that is and it probably took me 10 or so more more years to find out
[17:00.320 --> 17:04.880]  what MMX is but it's the first in a long list of SIMD instructions I haven't explained what
[17:04.880 --> 17:09.920]  SIMD is yet but I will in a second some of those especially the one in the in the top
[17:09.920 --> 17:15.440]  line they're not really used anymore these days but the the bottom line like AVX2 and AVX512
[17:15.440 --> 17:20.080]  you may have heard them in in fact for for many open source project they sometimes just sort of
[17:20.080 --> 17:25.840]  slap that label in the read me like yeah yeah has AVX2 optimizations and that kind of signals you
[17:25.840 --> 17:30.000]  yeah we care about speed because it's like low level optimized and VVA does the exact same thing
[17:30.000 --> 17:36.480]  by the way so to understand how we could make use of that I quickly need to talk about vector
[17:36.480 --> 17:42.160]  embeddings because I said before that VVA doesn't doesn't search through data by keywords but rather
[17:42.160 --> 17:48.000]  through its meaning and it uses vector embeddings as a tool for that so this is basically just a
[17:48.000 --> 17:52.720]  long list of numbers in this case floats and then a machine learning model comes in and basically it
[17:52.720 --> 17:57.200]  says do something with my input and then you get this vector out and if you do this on all the
[17:57.200 --> 18:01.920]  objects then you can compare your vectors so you basically can do a vector similarity comparison
[18:01.920 --> 18:06.480]  and that tells you if something is close to to one another or not so for example the query and the
[18:06.480 --> 18:13.840]  the object that we had before so without any simd we can use something called the dot product the
[18:13.840 --> 18:19.680]  dot product is a simple calculation where basically you use you multiply each element of the first
[18:19.680 --> 18:24.320]  vector with the same corresponding element of the second vector and then you just sum up all of
[18:24.320 --> 18:30.240]  those elements and we can think of this like multiplication and summing as two instructions
[18:30.240 --> 18:34.960]  so if we look out first shout out here to the compiler explorer which is a super cool tool to
[18:34.960 --> 18:40.400]  see like what your go code compiles to we can see that this indeed turns into two instructions so
[18:40.400 --> 18:44.480]  this is a bit of a lie because there's more stuff going on because it's in a loop etc but let's just
[18:44.480 --> 18:52.000]  pretend that indeed we have these two instructions to multiply it and to add it so how could we
[18:52.000 --> 18:57.120]  possibly optimize this even further if we're already at such a low level well we can because
[18:57.120 --> 19:03.120]  this is our mad journey so all we have to do is introduce some madness and what we're doing now
[19:03.760 --> 19:10.080]  is a practice that's called unrolling so the idea here is that instead of looping over one
[19:10.080 --> 19:14.960]  element at a time we're now looping over eight elements at a time but we've got we've gained
[19:14.960 --> 19:19.040]  nothing like this is we're still doing the same kind of work like we're doing 16 instructions now
[19:19.040 --> 19:23.600]  in a single loop and we're just doing fewer iterations so by this point nothing gained but
[19:23.600 --> 19:29.600]  why would we do that well here comes the part where I thought it was too good to be true what if
[19:29.600 --> 19:36.960]  we could do those 16 operations for the cost of just two instructions sounds crazy right well no
[19:36.960 --> 19:42.160]  because simd I'm finally revealing what the acronym stands for it stands for single instruction
[19:42.160 --> 19:47.520]  multiple data and that is exactly what we're doing here so we want to do the same thing over and over
[19:47.520 --> 19:53.440]  again which is multiplication and then additions and this is exactly what these simd instructions
[19:53.440 --> 19:58.080]  provide so in this case we can multiply eight floats with other eight floats and then we can
[19:58.080 --> 20:06.480]  add them up so all this perfect here maybe not because there's a catch of course it's our mad
[20:06.480 --> 20:14.320]  journey how do you tell go to use these avx two instructions you don't you write assembly code
[20:14.320 --> 20:19.280]  because go has no way to do that directly the good part is that assembly code integrates really
[20:19.280 --> 20:24.400]  nicely into go and in the in the standard library it's used over and over again so it's kind of a
[20:24.400 --> 20:29.920]  standard practice and there is tooling here so shout out to avo really cool too that helps you
[20:30.800 --> 20:35.360]  basically you're you're still writing assembly with with avo but you're writing it in go and then
[20:35.360 --> 20:39.120]  it generates the assembly so you still need to know what you're doing but it's like it it
[20:39.120 --> 20:47.520]  protects you a bit so it definitely helped us a lot so simd recap using avx instructions or
[20:47.520 --> 20:53.200]  other simd instructions you can basically trick your cpu into doing more work for free but you
[20:53.200 --> 20:59.200]  need to sort of also trick go to use assembly and with this tooling such as avo it can be better
[20:59.200 --> 21:03.840]  but it would be even nicer if the language had some sort of support for it and you made my saying
[21:03.840 --> 21:07.680]  now okay this is this mad guy on stage that wants to build a database but no one else does
[21:07.680 --> 21:12.960]  needs that but we have this issue here that was open recently and unfortunately also closed recently
[21:12.960 --> 21:18.320]  because no consensus could be reached but it comes up back and back basically that go users are
[21:18.320 --> 21:23.040]  saying like hey we want something in the language such as intrinsic so intrinsics are basically the
[21:23.040 --> 21:29.120]  idea of having high level language instructions to do these these sort of avx or simd instructions
[21:29.120 --> 21:35.360]  and c or c++ has that for example one way to do that and maybe you're wondering like okay if you
[21:35.360 --> 21:40.640]  have such a performance hot path like why don't you just write that in c and you see go or write it
[21:40.640 --> 21:46.160]  in rust or something like that sounds good in theory but the problem is that the call overhead
[21:46.160 --> 21:52.560]  to call c or c++ is so high that you actually have to outsource quite a bit of your code for that to
[21:52.560 --> 21:57.600]  to pay off again so if you do that you basically end up writing more and more and more in that
[21:57.600 --> 22:02.160]  language and then you're not writing go anymore so fortunately that's not or it can be in some
[22:02.160 --> 22:09.120]  ways but it's not always a great idea so demo time um this was going to be a live demo and
[22:09.120 --> 22:14.320]  maybe it still is because i prepared this running nicely in a docker container and then my docker
[22:14.320 --> 22:19.280]  network just broke everything and it didn't work but i just rebuilt it without docker and i think
[22:19.280 --> 22:24.880]  it might work if not i have screenshots basically that um that do a backup so example query here
[22:24.880 --> 22:31.200]  i'm a big wine nerd so what i did is i put wine reviews into vv8 and i want to search them now
[22:31.200 --> 22:36.240]  and one way to do it to show you basically that the keyword um that you don't need a keyword
[22:36.240 --> 22:43.360]  match but can search by meaning is for example if i go for an affordable italian wine let's see if
[22:43.360 --> 22:51.120]  the internet connection works it does so what we got back um is basically this this wine review
[22:51.120 --> 22:56.960]  that i wrote about a barolo that i recently drank and you can see it doesn't say italy
[22:56.960 --> 23:02.000]  anywhere it doesn't say affordable what it says like without breaking the bank so this is a vector
[23:02.000 --> 23:07.760]  search that basically happened in the in the background we can take this one step further by
[23:07.760 --> 23:12.960]  using the generative side so this is basically the the chat gpt part um we can now ask our
[23:12.960 --> 23:18.400]  database based on the review which is what i wrote when is this wine going to be ready to drink so
[23:18.400 --> 23:22.000]  let's see you saw before that was the fail query when the internet didn't work now now it's actually
[23:22.000 --> 23:27.280]  working so that's nice um and here in this case you can see that so this is using open ai but you
[23:27.280 --> 23:32.400]  can plug in other tools can plug in open source versions of it um this is using open ai because
[23:32.400 --> 23:36.080]  that's nice to be hosted at a at a service i don't have to run the machine learning model on my
[23:36.080 --> 23:40.400]  laptop then you can see it tells you the wine is not ready to drink yet we will need at least five
[23:40.400 --> 23:44.320]  more years which is sort of a good summary of this and then you can see another wine is ready
[23:44.320 --> 23:49.520]  to drink right now it's in the perfect drinking window so for the final demo let's combine those
[23:49.520 --> 23:56.160]  two let's do a semantic search to identify something and then do an ai generation basically
[23:56.160 --> 24:01.280]  so in this case we're saying find me an aged classic riesling best best wine in the world
[24:01.280 --> 24:07.200]  riesling um and based on the review would you consider this wine to be a fruit bomb so let's
[24:07.200 --> 24:13.040]  have sort of an opinion from the machine learning model in it and um here we got one of my favorite
[24:13.040 --> 24:18.400]  wines and the the model says no i would not consider this a fruit bomb while it does have
[24:18.400 --> 24:22.720]  some fruity notes it is balanced by the mineralogy and acidity which keeps it from being overly sweet
[24:22.720 --> 24:27.280]  or fruity which is um if you read the text like this is nowhere in there so this is kind of cool
[24:27.280 --> 24:33.920]  that the that the model was was able to do this okay so let's go back now it's the the demo time
[24:33.920 --> 24:38.720]  by the way have a github repo with like this example so you can run it yourself and um and
[24:39.360 --> 24:47.120]  yeah try it out yourself so this was our mad journey and are we mad at go are we mad to do this
[24:47.120 --> 24:51.680]  well i would pretty much say no because yes there were a couple of parts where we have to give
[24:51.680 --> 24:58.960]  a get really creative and had to do some some yeah rather unique stuff but that was also basically
[24:58.960 --> 25:02.480]  like the highlight reel of building a database and all the other parts like i didn't even show the
[25:02.480 --> 25:08.000]  parts that went great like concurrency handling and the powerful standard library and of course
[25:08.000 --> 25:13.120]  all of you basically representing the gopher community which is super helpful and yeah this
[25:13.120 --> 25:18.880]  was my way to basically give back to all of you so if you ever want to build a database or run into
[25:18.880 --> 25:23.680]  other kind of high performance problems then maybe some of those