[00:00.000 --> 00:13.240] Okay, our next speaker is going to talk about something we all used in Go, which is strings. [00:13.240 --> 00:16.680] If you didn't ever use it in Go, what are you doing here? [00:16.680 --> 00:23.600] So let's give a round of applause for Matej. [00:23.600 --> 00:24.600] Thank you, everyone. [00:24.600 --> 00:25.600] Thank you. [00:25.600 --> 00:29.800] Excited to be here, excited to see so many faces, excited to speak first time at the [00:29.800 --> 00:35.720] FOSDEM, also a bit intimidating, but hopefully I can show you a thing or two about string [00:35.720 --> 00:38.920] optimization in Go. [00:38.920 --> 00:41.000] About me, my name is Matej Gera. [00:41.000 --> 00:45.000] I work as a software engineer at a company called Coreologics, where we're building an [00:45.000 --> 00:46.800] observability platform. [00:46.800 --> 00:52.000] Apart from that, I'm active in different open source communities, mostly within the Cloud [00:52.000 --> 00:58.160] Native Computing Foundation, specifically in the observability area. [00:58.160 --> 01:03.280] I work a lot with metrics, I'm a maintainer of the TANAS project, which I will also talk [01:03.280 --> 01:06.480] a bit about during my presentation. [01:06.480 --> 01:12.000] And apart from that, I contribute to a couple different projects, most interestingly, Open [01:12.000 --> 01:13.800] Telemetry. [01:13.800 --> 01:15.640] And yeah, these are my handles. [01:15.640 --> 01:21.360] I'm not that active on social media, best is to reach me on the GitHub issues directly [01:21.360 --> 01:25.080] or PRs, and let's get into it. [01:25.080 --> 01:32.000] So if anything else, I'd like you to take at least three things today from this presentation. [01:32.000 --> 01:37.160] So first of all, I'd like you to understand how strings work behind the scenes in Go. [01:37.160 --> 01:42.000] This might be old news for many people who are more experienced with Go, or might be [01:42.000 --> 01:44.360] a new knowledge for newbies. [01:44.360 --> 01:50.320] But I want to set kind of a common ground from which we can talk about the optimization. [01:50.320 --> 01:55.800] Secondly, I want to tell you about the use cases in context of which I have been thinking [01:55.800 --> 02:00.160] about string optimization and where I think the presented strategies can be useful. [02:00.160 --> 02:05.640] And lastly, I want to tell you about the actual optimization strategies and show some examples [02:05.640 --> 02:09.800] of how they can be applied or where they have been applied. [02:09.800 --> 02:15.840] I won't be talking today much about stack versus heap, although a lot of this has to [02:15.840 --> 02:18.200] do with memory. [02:18.200 --> 02:21.800] For the presentation, I kind of assume we'll be talking more about the heap and kind of [02:21.800 --> 02:30.680] a long-term storage of strings in memory, also only going into encoding or related types [02:30.680 --> 02:35.920] like runes and charts, although it's all kind of related, but it's outside of the scope [02:35.920 --> 02:38.000] for today. [02:38.000 --> 02:41.920] So let me first tell you what kind of brought me to this topic, what was the inspiration [02:41.920 --> 02:42.920] behind this talk. [02:42.920 --> 02:47.840] As I already said, I worked primarily in the observability landscape with metrics and [02:47.880 --> 02:53.520] over the past almost two years, I was working a lot on the Thanos project, which I mentioned [02:53.520 --> 02:58.080] and which you can, for simplicity here, imagine as a distributed database for storing time [02:58.080 --> 02:59.600] series. [02:59.600 --> 03:06.280] And with these goals, it's intended to store millions of time series, even up to or more [03:06.280 --> 03:11.400] than billion series, we have heard also about deployments like that. [03:11.400 --> 03:16.600] And as I was working with Thanos and learning about these various aspects and components, [03:16.600 --> 03:20.440] one particular issue that has been standing out to me was the amount of memory needed [03:20.440 --> 03:24.560] for certain Thanos components to operate. [03:24.560 --> 03:31.640] And this is partly due to the fact that the time series data is stored in memory in a [03:31.640 --> 03:33.880] time series database. [03:33.880 --> 03:39.440] And this is where I decided to focus my attention, where I started to explore what are some possible [03:39.440 --> 03:44.240] avenues where we could optimize the performance here. [03:44.240 --> 03:47.840] The big role here was played by doing this in a data-driven way. [03:47.840 --> 03:54.840] So I started looking at different data points from Thanos, like metrics, profiles, benchmarks. [03:54.840 --> 03:59.800] And this small side note, because I considered data-driven performance optimization to be [03:59.800 --> 04:04.520] the most important when you're improving efficiency of your program. [04:04.520 --> 04:09.040] So I don't want to diverge here, but I highly recommend for you to check out a talk by Partik [04:09.040 --> 04:12.120] Plotka, who I think is in the room here. [04:12.120 --> 04:17.200] So he's talking a couple of thoughts after me, who is kind of dedicating a lot of his [04:17.200 --> 04:21.680] time into this data-driven approach to efficiency in the ecosystem. [04:21.680 --> 04:25.800] I don't have it on the slide, but also the presentation that's after me, that has to [04:25.800 --> 04:28.640] do with squeezing go functions, it seems interesting. [04:28.640 --> 04:34.520] So a lot of optimization talks today, which I love to see. [04:34.520 --> 04:41.720] And he might also ask why string-specific, what makes them so interesting or so optimization-worthy. [04:41.720 --> 04:47.680] And although I've been looking at Thanos for some time, something clicked after I've [04:47.680 --> 04:50.440] seen this particular image at the different presentation. [04:50.440 --> 04:55.480] So this was presentation from Brian Borum, I know it should be also somewhere around [04:55.480 --> 05:02.280] FOSDEM, who is working on a kind of a neighboring project called Prometheus, which is a time [05:02.280 --> 05:05.120] series database on which Thanos is built. [05:05.120 --> 05:10.440] So if Thanos is kind of a distributed version of Prometheus, we reuse a lot of the code [05:10.440 --> 05:16.440] from Prometheus and also the actual time series database code. [05:16.440 --> 05:21.840] So he shows, based on the profile and on the icicle graph that you see here, that the labels [05:21.840 --> 05:25.840] take most of the memory in Prometheus, and that was around one-third. [05:25.840 --> 05:30.120] And when I thought about it, the result was rather surprising to me, because the labels [05:30.120 --> 05:36.640] of the time series, we could think of them as some kind of metadata or some kind of contextual [05:36.640 --> 05:41.360] data about the actual data points, about the samples, as we call them, and these were [05:41.360 --> 05:46.680] taking up more spaces than those actual data points, those actual samples themselves. [05:46.680 --> 05:51.320] So there's been a lot of thought and work put into optimization and compression of the [05:51.320 --> 05:56.360] samples of the actual time series data, but Brian's finding indicated that there can be [05:56.360 --> 05:59.120] more, can be squeezed out of labels. [05:59.120 --> 06:01.180] And what are actually labels? [06:01.180 --> 06:06.860] Labels are key value pairs attached to a given time series to kind of characterize it. [06:06.860 --> 06:10.860] So in principle, they are nothing more than pairs of strings. [06:10.860 --> 06:13.940] So this is what brought me in the end to the strings. [06:13.940 --> 06:17.700] And it inspired me to talk about this topic to a large audience. [06:17.700 --> 06:23.260] I thought it might be useful to look at this from kind of a more general perspective, even [06:23.260 --> 06:28.900] though we're dealing with this problem in a limited space of observability, I think [06:28.940 --> 06:33.860] it can be also, some learnings from this can be gained and used also in different, in [06:33.860 --> 06:37.420] other types of programs. [06:37.420 --> 06:42.060] So first let's lay foundations to our talk by taking a look at what string actually is [06:42.060 --> 06:43.060] in Go. [06:43.060 --> 06:46.340] So most of you probably are familiar with different properties of strings. [06:46.340 --> 06:47.700] They are immutable. [06:47.700 --> 06:52.420] They can be converted easily into slides of bytes, can be concatenated, sliced, et cetera, [06:52.420 --> 06:53.420] et cetera. [06:53.420 --> 06:57.260] However, talking about the qualities of strings does not answer the question what strings [06:57.260 --> 06:58.260] really are. [06:58.500 --> 07:02.740] And if you look at the source code of Go, you'll see that the strings are actually represented [07:02.740 --> 07:05.260] by the string struct struct. [07:05.260 --> 07:08.900] So strings are structs, shocking, right? [07:08.900 --> 07:13.180] You can also get the runtime representation of this from the Reflect package, which contains [07:13.180 --> 07:15.380] the string header type. [07:15.380 --> 07:19.980] So based on these two types, we see that the string consists of a pointer to the actual [07:19.980 --> 07:25.180] string data in the memory, an integer which gives information about the size of the string. [07:25.180 --> 07:28.780] When Go creates a string, it allocates storage corresponding to the provided string size and [07:28.780 --> 07:32.820] then sets the string content as a slice of bytes. [07:32.820 --> 07:36.180] As you've seen, the string data is stored as a contingent slice of bytes memory. [07:36.180 --> 07:41.100] The size of the strings stays the same during its lifetime, since, as I mentioned previously, [07:41.100 --> 07:42.100] the string is immutable. [07:42.100 --> 07:45.860] And this also means that the size and the capacity of the backing slice of bytes stays [07:45.860 --> 07:46.860] the same. [07:46.860 --> 07:51.340] When you put this all together, the total size of the string will consist of the overhead [07:51.340 --> 07:55.940] of the string header, which is equal to 16 bytes, and I show in a bit why, and the byte [07:55.940 --> 07:57.660] length of the string. [07:57.660 --> 08:03.260] We can break this down on this small example of the string I created with FOSDEM, space, [08:03.260 --> 08:04.820] waving hand emoji. [08:04.820 --> 08:05.980] So this is just a snippet. [08:05.980 --> 08:12.860] I don't think it would compile this code, but for brevity, I decided to show these three [08:12.860 --> 08:14.700] small lines. [08:14.700 --> 08:19.460] And by calling the size method on the string type from the Reflect package, you would see [08:19.460 --> 08:22.180] it return number 16. [08:22.180 --> 08:23.180] Don't be fooled. [08:23.180 --> 08:28.140] The size method returns only the information of the size of the type, not size of the whole [08:28.140 --> 08:29.140] string. [08:29.140 --> 08:33.340] Therefore, it correctly tells us it's 16 bytes, 18 bytes due to pointer pointing to the string [08:33.340 --> 08:37.540] in memory, and 8 bytes for keeping the string length information. [08:37.540 --> 08:41.900] To get the size of the actual string data, we have to use the good old length method. [08:41.900 --> 08:44.220] This tells us it's 11 bytes. [08:44.220 --> 08:45.380] This is the string literal. [08:45.380 --> 08:47.300] Here is UTF-8 encoded. [08:47.300 --> 08:52.420] We count one byte per each letter and space, and we need actually four bytes to encode [08:52.420 --> 08:54.340] the waving hand emoji. [08:54.340 --> 08:58.140] And this brings our total to 27 bytes. [08:58.140 --> 09:02.580] Interestingly for such a short string, the overhead of storing it is bigger than the string [09:02.580 --> 09:05.700] data itself. [09:05.700 --> 09:09.540] It's also important to realize what happens if we declare a new string variable that is [09:09.540 --> 09:11.060] copying an existing string. [09:11.300 --> 09:16.020] In this case, co-creates what we can consider a shallow copy, meaning the data the string [09:16.020 --> 09:18.780] refers to is shared between the variables. [09:18.780 --> 09:21.380] Let's break it down again on the example of our FOSDEM string. [09:21.380 --> 09:27.060] So we declare a new string literal, FOSDEM waving hand emoji, and then create a new [09:27.060 --> 09:32.540] STR or new string variable, and set it to value equal to string or STR. [09:32.540 --> 09:34.140] What happens behind the scenes? [09:34.140 --> 09:37.780] If you would look at the values, pointer of each of the strings, you would see different [09:37.780 --> 09:38.780] addresses. [09:38.780 --> 09:43.740] We're making it obvious that these are two different strings strictly speaking, but looking [09:43.740 --> 09:48.340] at their headers, we would see identical information, same pointer to string data, [09:48.340 --> 09:49.340] and same length. [09:49.340 --> 09:50.340] But because... [09:50.340 --> 09:55.180] Excuse me, sir, can we turn the light on the front off first? [09:55.180 --> 09:56.180] I cannot. [09:56.180 --> 09:57.180] Sorry. [09:57.180 --> 09:58.180] Okay. [09:58.180 --> 09:59.180] Sorry. [09:59.180 --> 10:05.140] Yeah, it's a bit light, right, sorry. [10:05.140 --> 10:10.820] But anyway, so these are two different strings strictly speaking, and looking at the header [10:10.820 --> 10:16.780] information, we would see that they point to same string data and have same length. [10:16.780 --> 10:20.700] Because they are two different strings, we need to be mindful of the fact that the new [10:20.700 --> 10:23.060] STR comes with a brand new string header. [10:23.060 --> 10:28.500] So the bottom line is, when we do this copying, there is, again, even the data is shared, [10:28.500 --> 10:32.660] the overhead of 16 bytes is still there. [10:32.660 --> 10:36.500] So I briefly talked about my inspiration for this talk, but I also wanted to expand a bit [10:36.500 --> 10:42.100] on the context of the problems, where I think the string optimization strategies can be [10:42.100 --> 10:43.100] useful. [10:43.100 --> 10:48.740] I think in general, many programs with characteristics of in-memory stores may face performance issue. [10:48.740 --> 10:52.340] I will talk about in this slide such programs. [10:52.340 --> 10:57.180] I already mentioned numerous times, the time series database, DNS resolvers, or any other [10:57.180 --> 11:02.100] kind of key value store, where we come with an assumption that these are some long running [11:02.100 --> 11:09.820] programs, and over the runtime of the program, we will keep the number of strings we will [11:09.820 --> 11:12.180] keep accumulating. [11:12.180 --> 11:15.180] So we can be talking potentially billions of strings. [11:15.180 --> 11:19.180] There's also potential for repetitions of strings, since many of these stored values [11:19.180 --> 11:21.060] may repeat themselves. [11:21.060 --> 11:25.700] So for example, if we associate each of our entries with a label denoting which cluster [11:25.700 --> 11:30.540] they belong to, we are guaranteed to have repeated values, since we have a finite and [11:30.540 --> 11:32.660] often small amount of clusters. [11:32.660 --> 11:38.820] So the string cluster will be stored as many times as many entries there are in our database. [11:38.820 --> 11:42.740] There are also certain caveats when it comes to handling of incoming data. [11:42.740 --> 11:50.460] Data will often come in a form of request through HTTP or GRPC or any other protocol, [11:50.460 --> 11:56.500] and usually we handle this data in our program by un-martialing them into a struct, and then [11:56.500 --> 12:03.740] we might want to store some information, some string from this struct in the memory for [12:03.740 --> 12:04.740] future use. [12:04.740 --> 12:09.580] However, the side effect of this is that the whole struct will be prevented from being [12:09.580 --> 12:14.340] garbage collected, because as long as the string or as a matter of fact any other field [12:14.340 --> 12:21.060] from a struct is being referenced by our database in memory, the garbage collection [12:21.060 --> 12:25.580] won't kick in and eventually will lead to bloats in the memory consumption. [12:25.580 --> 12:32.260] I think the second kind of different type of programs where string optimization can [12:32.260 --> 12:39.380] be useful are kind of one of data processing situations as opposed to the long-running [12:39.380 --> 12:40.380] programs. [12:40.380 --> 12:47.020] So we can take an example of handling some large JSON file, perhaps it can be some data [12:47.020 --> 12:51.900] set from a study or a health data, which I think were some good examples I've seen [12:51.900 --> 12:57.100] out in the wild, and such processing will require a larger amount of memory to decode [12:57.100 --> 12:58.860] the data during processing. [12:58.860 --> 13:03.020] So even though we might be processing same strings that repeat themselves over and over [13:03.020 --> 13:07.300] again such as the keys in the JSON document, we're having to allocate such strings in [13:07.300 --> 13:09.300] new each time. [13:09.300 --> 13:15.940] So now that we have a better understanding of the problem zones, let's look at the actual [13:15.940 --> 13:18.860] optimization strategies. [13:18.860 --> 13:25.620] So the first strategy is related to the issue I mentioned a couple of slides before where [13:25.620 --> 13:33.620] we are wasting memory by keeping whole structs in memory when we only need part of the struct [13:33.620 --> 13:35.660] that is represented by the string. [13:35.660 --> 13:40.100] So what we want to do here is to have a mechanism that will allow us to quote unquote detach [13:40.100 --> 13:44.500] the string from the struct so that the rest of the struct can be garbage collected. [13:45.060 --> 13:48.940] Previously this was also possible to achieve with some unsafe manipulation of strings, [13:48.940 --> 13:55.060] but since Go 118 there's a new method called clone in the string standard library that [13:55.060 --> 13:57.460] makes it quite straightforward. [13:57.460 --> 14:01.300] Though clone creates a new fresh copy of the string, this decouples the string from the [14:01.300 --> 14:06.300] struct, meaning the struct can be garbage collected in the long term and will retain [14:06.300 --> 14:08.620] only the new copy of the string. [14:08.620 --> 14:13.060] So remember previously I showed that when we copy strings we create shallow copies, here [14:13.100 --> 14:17.700] we want to achieve the opposite, we want to truly copy the string and create a fresh copy [14:17.700 --> 14:22.020] of the underlying string data so the original string can be garbage collected together [14:22.020 --> 14:28.180] with the struct it's part of, so this we can refer to as deep copying. [14:28.180 --> 14:32.580] The next most interesting and I'd say one of the most widely used strategies in software [14:32.580 --> 14:35.060] in general is string interning. [14:35.060 --> 14:38.820] String interning is a technique which makes it possible to store only a single copy of [14:38.820 --> 14:43.180] each distinct string and subsequently we keep referencing the same underlying string [14:43.180 --> 14:44.620] in the memory. [14:44.620 --> 14:49.420] This concept is somewhat more common in other languages such as Java or Python but can be [14:49.420 --> 14:54.060] implemented effortlessly in Go as well and there are even some ready-made solutions out [14:54.060 --> 14:56.580] in the open that you can use. [14:56.580 --> 15:03.380] So at Simplus you could achieve this by having a simple map string string and you can keep [15:03.380 --> 15:08.740] the references to the string in this map which we can call our interning map or cache [15:08.740 --> 15:13.220] or anything like that. [15:13.220 --> 15:18.540] First complication comes with the concurrency, right, because we need a mechanism to prevent [15:18.540 --> 15:23.380] concurrent write and read to our interning map so obvious choice would be to use mutex [15:23.380 --> 15:27.260] which have our incurred performance penalty but so be it. [15:27.260 --> 15:31.780] Our concurrency save map version from the sync standard library. [15:31.780 --> 15:36.260] The second complication or the noteworthy fact is that with each new reference string [15:36.260 --> 15:41.380] we are incurring the 16 bytes overhead as I explained a couple of slides back. [15:41.380 --> 15:47.900] So even though we're saving on the actual string data, it's not, we're still incurring [15:47.900 --> 15:55.620] the overhead so with millions of strings, 16 bytes for every string, it's a non-trivial [15:55.620 --> 15:57.620] amount. [15:57.620 --> 16:02.220] Third complication comes from the unknown lifetime of the string in our interning map. [16:02.220 --> 16:07.020] At some point in the lifetime of the program there might be no more references to a particular [16:07.020 --> 16:09.620] string so it can be safely dropped. [16:09.620 --> 16:12.780] But how to know when these conditions are met? [16:12.780 --> 16:18.100] Ideally we don't want to be keeping unused strings as in an extreme case this can be [16:18.100 --> 16:25.500] a denial of service vector leading to exhaustion of memory if we allow the map to grow unbounded. [16:25.500 --> 16:29.540] One option could be to periodically clear the map or give the entries a certain time [16:29.540 --> 16:34.540] to live so after a given period the map or the given entries are dropped from the map [16:34.540 --> 16:39.540] and if a string reappears after such deletion we simply create the entry in the interning [16:39.540 --> 16:45.700] map so kind of like a cache and naturally this can lead to some unnecessary churning [16:45.700 --> 16:49.700] and unnecessary allocations because we don't know exactly which strings are no longer needed [16:49.700 --> 16:54.140] or referenced but we might be still dropping them. [16:54.140 --> 16:59.940] One and more elaborate way to do this is to keep counting the number of references of [16:59.940 --> 17:05.540] the used strings and this naturally requires a more eloquent and complex implementation [17:05.540 --> 17:10.660] but you can see here I linked a work done in the Prometheus project writing is a good [17:10.660 --> 17:17.700] example of how this can be implemented with counting the references. [17:17.700 --> 17:22.500] We can take this even to the next level as I recently learned there is an implementation [17:22.500 --> 17:27.900] of an interning library that is capable of automatically dropping unused references. [17:27.900 --> 17:34.020] The go4.org intern library is capable of doing this thanks to somewhat controversial concept [17:34.020 --> 17:37.860] of the finalizers in the go runtime. [17:37.860 --> 17:42.380] Finalizers set very plainly make it possible to attach a function that will be called on [17:42.380 --> 17:47.460] a variable that is deemed to be garbage collection ready by the garbage collector. [17:47.460 --> 17:52.380] At that point this library checks the sentinel boolean on the reference value and if it finds [17:52.380 --> 17:57.460] this is the last reference to that value it drops it from a map. [17:57.460 --> 18:01.700] The library also cleverly boxes the string header down to a single pointer which brings [18:01.700 --> 18:06.060] the overhead down to 8 bytes instead of 16. [18:06.060 --> 18:10.740] So as fascinating as this implementation is to me it makes uses of some potentially unsafe [18:10.740 --> 18:15.740] code behavior hence the dark arts reference in the slide title. [18:15.740 --> 18:19.540] However the library is deemed stable and major enough and has been created by some well-known [18:19.540 --> 18:21.380] names in the go community. [18:21.380 --> 18:26.980] So if you're interested I encourage you to study and look at the code it's just one file [18:26.980 --> 18:33.860] but it's quite interesting and you're sure to learn a thing or two about some less known [18:33.860 --> 18:37.220] parts of go. [18:37.220 --> 18:43.500] And as an example I recently tried this library in the last blood point in the TANOS project [18:43.500 --> 18:48.860] again I linked you the PR with the usage with the implementation which I think is rather [18:48.860 --> 18:50.820] straightforward. [18:50.820 --> 18:59.700] And we ran some synthetic benchmarks on this version in turning on this was the result. [18:59.700 --> 19:05.060] On the left side you can see probably not very clearly unfortunately but there is a graph [19:05.060 --> 19:13.460] showing metrics for both reported by the go runtime, how many bytes we have in the heap [19:13.460 --> 19:22.500] and metrics reported by the container itself and you can see the differences between the [19:22.500 --> 19:28.700] green and yellow line and the blue and red line so it came up to roughly two to three [19:28.700 --> 19:35.940] gigabytes improvement per instance so this is averaged per I think across six or nine [19:35.940 --> 19:41.100] instances so per instance this was around two to three gigabytes so we can count overall [19:41.100 --> 19:46.780] improvement around ten to twelve gigabytes but more interestingly on the right side of [19:46.780 --> 19:52.700] the slide there is another graph to kind of confirm that the interning is doing something [19:52.700 --> 19:59.980] that it's working then we can see we're following again a metric reported by the go runtime [19:59.980 --> 20:07.060] and we're looking at the number of objects held in the memory so we can see that it dropped [20:07.060 --> 20:12.100] almost by health when we look at the average. [20:12.100 --> 20:15.620] Finally there's a string interning with a slightly different flavor I would say which [20:15.620 --> 20:20.820] I refer to a string interning with symbol tables and in this alternative instead of [20:20.820 --> 20:26.260] keeping a reference string we replace it with another referring symbol such as for example [20:26.260 --> 20:30.900] an integer so the integer one will correspond to string apple or string integer two will [20:30.900 --> 20:35.540] correspond to string banana and so on and this can be beneficial with scenarios with [20:35.540 --> 20:40.980] a lot of duplicated strings again this brings me to my home field and to the time series [20:40.980 --> 20:46.660] databases where there is generally a high probability of the labels so also the strings [20:46.660 --> 20:53.140] being repeated and especially when such strings are being sent over the wire so instead of [20:53.140 --> 20:58.700] sending all the duplicated strings we can send a symbol table in their place and we [20:58.700 --> 21:04.220] can replace the strings with the references in this table so where this idea come from [21:04.260 --> 21:10.220] or where I got inspired for this was also in Thanos but this was by one of my fellow [21:10.220 --> 21:15.900] maintainers so you can look at that PR who implemented this for data series being sent [21:15.900 --> 21:22.340] over the network between Thanos components so instead of sending all the long and unduplicated [21:22.340 --> 21:27.940] label keys and values so instead of sending all of these strings we build a symbol table [21:27.940 --> 21:35.380] that we send together with the duplicated label data that includes that contains only [21:35.380 --> 21:40.100] references instead of the strings so that all we have to do on the other side once [21:40.100 --> 21:44.740] we receive the data is to replace the references by the actual strings based on the symbol [21:44.740 --> 21:50.460] table which saves us on one hand the cost of the network since the requests are smaller [21:50.460 --> 21:56.460] and also the allocations once we're dealing with the data on the receiving side. [21:57.260 --> 22:02.940] Lastly you could try putting all of the strings into one big structure into one big string [22:02.940 --> 22:07.340] and this can be useful to decrease the total overhead of the strings as this eliminates [22:07.340 --> 22:17.540] the already mentioned overhead of the string header so yeah since this is always 16 bytes [22:17.540 --> 22:22.700] plus the byte length of the string which consists which creates the size of the string by putting [22:22.700 --> 22:30.060] all the strings into the one we can effectively decrease the overhead of those string headers. [22:30.060 --> 22:34.140] So of course this is not without added complexity because now we have to deal with how to look [22:34.140 --> 22:41.540] up those sub strings or those smaller strings within the bigger structure and so you need [22:41.540 --> 22:46.820] a mechanism because you cannot simply look them up in a map or symbol table and obviously [22:46.820 --> 22:52.260] another already mentioned complication such as concurrent access you also have to deal [22:52.260 --> 22:57.340] with this and I think particularly interesting attempt at this is going on in the Prometheus [22:57.340 --> 23:04.100] project which again this is done by Brian Boren who I mentioned in the previous slides [23:04.100 --> 23:11.460] so if you're interested feel free to check out this PR. [23:11.460 --> 23:17.780] So I will conclude with a few words of caution so I have shown you some optimization techniques [23:17.780 --> 23:22.220] that I found particularly interesting when I was doing my research but let's not be naive [23:22.220 --> 23:26.420] these are not magic ones that will make your program suddenly work faster and with fewer [23:26.420 --> 23:31.780] resources this is still a balancing exercise so many of the presented techniques can save [23:31.780 --> 23:36.460] memory but will actually increase the time it takes to retrieve a string so when I mean [23:36.460 --> 23:40.980] optimization this is mostly in a situation where we want to decrease expensive memory [23:40.980 --> 23:47.020] footprint of our application while sacrificing a bit more CPU a tradeoff that I believe is [23:47.020 --> 23:49.700] reasonable in such setting. [23:49.700 --> 23:54.660] Also not making any concrete claims about performance improvements of various techniques [23:54.660 --> 24:00.420] as you have seen and I think this nicely ties into the introduction of my talk where I talked [24:00.420 --> 24:05.820] about the need of data data driven optimization so I believe there's still more data points [24:05.820 --> 24:10.980] needed to show how well these techniques work in practice how well they can work in your [24:10.980 --> 24:16.620] specific use case how they compare with each other when it comes to performance and whether [24:16.620 --> 24:22.540] there are some other real world implications or maybe properties of go or compiler or the [24:22.540 --> 24:30.220] runtime that might not render them useful in practice or the performance gain might [24:30.220 --> 24:38.700] be negligible so just to say that your mileage might vary but I think these ideas are worth [24:38.700 --> 24:55.940] exploring and can be interesting and that is all from my side thank you for your attention. [24:55.940 --> 25:00.700] Also included a couple more resources for those who are interested you can find the slides [25:00.700 --> 25:02.580] in the PENTA bar.