[00:00.000 --> 00:09.480]  Okay, welcome back.
[00:09.480 --> 00:13.360]  So while you all have been walking in, I've been quickly reading this book, Efficient
[00:13.360 --> 00:18.680]  Go, it reads very quickly, and now Bartek has made sure that my code is ten times quicker,
[00:18.680 --> 00:19.880]  so tell us everything about it.
[00:19.880 --> 00:20.880]  Thank you.
[00:20.880 --> 00:28.560]  Thank you very much, everybody.
[00:28.560 --> 00:29.560]  So welcome.
[00:29.560 --> 00:31.520]  I hope your travels went well.
[00:31.520 --> 00:37.520]  Mine were, like, canceled, flight canceled, change of route, so I had lots of adventures,
[00:37.520 --> 00:41.600]  but generally I'm super happy I made it, and we are at the FOSDEM.
[00:41.600 --> 00:47.960]  So in this talk, I would like to invite you to learn more about efficiency of our Go programs,
[00:47.960 --> 00:53.920]  and there are already two talks that I have been on who mentioned, you know, optimizations
[00:53.920 --> 00:58.200]  in its name, and, like, generally how to make software more efficient.
[00:58.200 --> 01:02.480]  I wonder where this, I don't know, it's not hype, but it's already three talks about
[01:02.480 --> 01:08.360]  one topic, why it's so popular, is it because everybody's saving me money, that might be
[01:08.360 --> 01:15.400]  a reason, but I'm super happy we are really uncovering this for Go, because Go alone might
[01:15.400 --> 01:20.120]  be fast, but that doesn't mean that we cannot, you know, doesn't need to care about, you
[01:20.120 --> 01:26.640]  know, making it better, and use these resources when we execute it, right?
[01:26.640 --> 01:31.240]  So let's learn about that, and turns out that, you know, you can save literally millions
[01:31.240 --> 01:35.600]  of dollars if you, you know, optimize some code, sometimes in production, long term,
[01:35.600 --> 01:37.880]  so it really matters, right?
[01:37.880 --> 01:41.880]  But before we start, short introduction, my name is Bartolome Vodka, I'm an engineer
[01:41.880 --> 01:50.120]  at Google, normally I work at Google Cloud, Google managed Prometheus service, but generally
[01:50.120 --> 01:58.000]  I'm open source, I love Go, I love distributed systems, observability topics, I maintain TANOS,
[01:58.000 --> 02:04.280]  which is like open source scalable Prometheus system, I maintain Prometheus as well, and
[02:04.280 --> 02:08.960]  generally, yeah, lots of things in open source, I mentor a lot, and I suggest you to check,
[02:08.960 --> 02:14.560]  you know, also try to mentor others, it's super important to bring new generation of
[02:14.560 --> 02:20.080]  people up to the speed in the open source, and yeah, I'm active in the CNCF.
[02:20.080 --> 02:27.480]  And recently, as you see, I published a book, and I think, you know, it's kind of unique,
[02:27.480 --> 02:31.240]  everybody's doing TikToks now, and, you know, YouTube, and I was like, yeah, let's be old
[02:31.240 --> 02:35.280]  school, because, you know, you need to be unique sometimes in the world, and I really
[02:35.280 --> 02:40.000]  enjoyed that, I learned a lot during that, and I would love you to learn as well, so
[02:40.000 --> 02:46.480]  I'm kind of summarizing of some concepts from my book here in the stock, so let's go.
[02:46.480 --> 02:52.720]  And I would like to start with this story, and, you know, apparently some of the talks,
[02:52.720 --> 02:56.200]  one of the best talks, have to start with the story, but this is something that kind
[02:56.200 --> 03:02.800]  of maybe triggered me to write the book, right, so imagine that, I mean, yeah, that was kind
[03:02.800 --> 03:07.880]  of five years ago, we just started the project called Thanos Open Source, really it doesn't
[03:07.880 --> 03:13.360]  matter what it does right now, but, you know, what happens is that it has microservices,
[03:13.360 --> 03:17.680]  it has, you know, I think, six different microservices written in Golang, you put in communities
[03:17.680 --> 03:22.440]  or any other cloud, and it's just a distributed database, and one part of this database is
[03:22.440 --> 03:27.120]  compactor, it's like a component, again, doesn't matter much what it does, what it matters
[03:27.120 --> 03:32.800]  is that it touches object storage, and it processes, you know, sometimes gigabytes or terabytes
[03:32.800 --> 03:39.440]  daily of metrics, right, of some data, so what happened is that at the very beginning
[03:39.440 --> 03:45.880]  of implementation, as you can imagine, you know, we implemented, yeah, MVP, it kind of
[03:45.880 --> 03:50.400]  functionally worked, but of course, you know, the implementation was kind of naive, definitely
[03:50.400 --> 03:55.160]  not optimized, we didn't even run any benchmark, right, other than just running on production
[03:55.160 --> 04:01.320]  and just, yeah, kind of works, so, and you're laughing, but this is usually, you know, what
[04:01.320 --> 04:07.160]  development in a higher velocity looks like, and it was working very well, until, of course,
[04:07.160 --> 04:12.560]  more people put load into this, and, you know, we have some issues like Ooms, you know, one
[04:12.560 --> 04:21.240]  user pointed us to some graphs of, you know, incredibly high spike of memory usage on the
[04:21.240 --> 04:25.080]  heap, on the Golan heap, right, and you can see it's a drop, which means, you know, there
[04:25.080 --> 04:30.240]  was a restart or someone killed this, and, yeah, and the numbers are not small, like
[04:30.240 --> 04:37.520]  15 gates, I mean, for large data set, maybe it's fine, but it was kind of problematic,
[04:37.520 --> 04:43.000]  right, so it was really interesting to see what different feedback and what different
[04:43.000 --> 04:47.520]  suggestions community were giving us, and I mean, community, everybody, like users,
[04:47.520 --> 04:51.760]  other developers, maybe product managers, we don't know sometimes who they role are,
[04:51.760 --> 04:57.360]  but, you know, probably depending on their background, the answers, the proposals were
[04:57.360 --> 05:03.880]  totally different, right, so I would like you to kind of, you know, check, and like,
[05:03.880 --> 05:11.160]  check if you had the same situations in your experience, because, you know, this is kind
[05:11.160 --> 05:15.960]  of like very ongoing problem, and I would like to, yeah, showcase this, so, you know,
[05:15.960 --> 05:21.000]  first suggestion was that, can you give me a configuration that doesn't womb, and it's
[05:21.000 --> 05:26.640]  like, what, do you expect me, like, very new project to have, like, flags, not a womb,
[05:26.640 --> 05:31.600]  or like, useless memory, this is not as simple as that, yet many, many users are asking us
[05:31.600 --> 05:35.840]  this question, or person's, or person's project, probably you heard this question, okay, what
[05:35.840 --> 05:40.600]  configuration I should use, so it uses less memory, right, or like, it just, it's more
[05:40.600 --> 05:44.520]  optimized, how can I optimize using configuration, it's just, you know, it's not as simple as
[05:44.520 --> 05:49.080]  that, I guess, you know, maybe in Java, in JVM, you have lots of performance flags, you
[05:49.080 --> 05:53.240]  sometimes tune things, and it's better, but, you know, it's not so simple, it's a goal,
[05:53.240 --> 05:59.520]  like, kind of low level, you, I mean, yeah, it's, you need to do more than that, right,
[05:59.520 --> 06:04.320]  another, you know, interesting approach, but very, very good in some way, is it just, okay,
[06:04.320 --> 06:09.560]  I will just put this process into bigger machine, and it's that, and that's totally valid,
[06:09.560 --> 06:14.600]  you know, solution, maybe short term, maybe sometimes it's enough, but, you know, in our
[06:14.600 --> 06:20.000]  case, it was not sustainable, because of course, you couldn't grow vertically more and more,
[06:20.000 --> 06:23.760]  and also, even if you would maybe find the big enough machine that was working for your
[06:23.760 --> 06:29.960]  data set, then, you know, obviously, you were overpaying a lot, if the code is naive and
[06:29.960 --> 06:35.800]  maybe wasting a lot of memory, right, then finally, you know, the most fun approach, okay,
[06:35.800 --> 06:40.360]  let's split this one microservice into, you know, like a schedule there, and then, you
[06:40.360 --> 06:45.960]  know, warcares, and then we'll just replicate in my super nice computer, you know, communities
[06:45.960 --> 06:50.600]  cluster, and, you know, it will just horizontally scale, so I can use many, many hundreds of
[06:50.600 --> 06:57.440]  small machines, so it will work, yes, but, you know, you are putting on small, kind of,
[06:57.440 --> 07:02.480]  microservice so much complexity that it will be, like, more expensive, generally, right,
[07:02.480 --> 07:08.680]  so the network costs, like, distributed systems, you know, injects, you know, things that you
[07:08.680 --> 07:14.160]  have to replicate data, finally, so you overpay more and more and more, and you are, kind
[07:14.160 --> 07:20.520]  of, distributing this non-optimized code to different places, that's not always the solution.
[07:20.520 --> 07:25.160]  Sometimes the code cannot be optimized more, and we can, you know, we should probably horizontally
[07:25.160 --> 07:30.080]  scale, but not in the very beginning of the project, right, yet, that was the first suggestion
[07:30.080 --> 07:34.680]  from the community, right, of course, you can just switch from Thanos to something else,
[07:34.680 --> 07:38.480]  right, that's also solution, and then, if you have this approach, and probably you would
[07:38.480 --> 07:43.840]  just jump through project, this is not super efficient, but maybe, you know, some parts
[07:43.840 --> 07:50.080]  of the project are better or some worse, that's an option, some suggestion, of course, paying
[07:50.080 --> 07:56.640]  for vendor, right, like, they will solve the problems for me, for real money, so, but yeah,
[07:56.640 --> 08:01.800]  that's not always a good solution, like, that's just giving up, and also, you know, migration
[08:01.800 --> 08:10.200]  of data, huge cost of learning new tools, and so on, and, you know, all of this work
[08:10.200 --> 08:16.640]  we're in the code, we have this, and it's like, you know, it's bumping, and super easy
[08:16.640 --> 08:24.960]  ways that you could be avoided, right, and, yeah, so, you know, of course, that was Maloch,
[08:24.960 --> 08:29.880]  so in C++, I mean, in Bugo, we don't have Maloch and so on, but, you know, memory overhead,
[08:29.880 --> 08:34.240]  memory leaks like that, like, are very common in Golan, like, just imagine how many gorotins
[08:34.240 --> 08:38.840]  sometimes you put, you created, you forgot to close some kind of abstraction, and the
[08:38.840 --> 08:43.760]  gorotin is leaking, and so you are leaking memory like this Maloch, right, so, and, you
[08:43.760 --> 08:52.640]  know, what actually, you know, was the solution, was some contributor finally came up, investigated,
[08:52.640 --> 08:57.560]  what about this efficiency problem on the code level, algorithm and code level, right,
[08:57.560 --> 09:02.320]  and we wrote, or like, we wrote small part of the, of the compactor to stream data, right,
[09:02.320 --> 09:10.080]  so instead of building maybe the kind of resulted object that the compactor is doing in memory,
[09:10.080 --> 09:15.280]  it was as soon as possible streaming that to file system, easy, generally easy, easy,
[09:15.280 --> 09:20.920]  easy change, yet there was lots of discussions, lots of stress, lots of weird ideas, and I
[09:20.920 --> 09:26.280]  would just find it like, over time, amusing that this, this story was repeating in many,
[09:26.280 --> 09:30.920]  many cases, right, so, and you know, that's not only, you know, of course, experience,
[09:30.920 --> 09:36.360]  so many, so many kind of nice examples where only small character change, two character
[09:36.360 --> 09:42.120]  change there, and, you know, so much kind of like improvement over like large system,
[09:42.120 --> 09:48.440]  so sometimes, sometimes there are like, very easy ways that we can just pick it up and
[09:48.440 --> 09:54.120]  just do it, right, but we need to know how, right, so kind of two learnings from the story,
[09:54.120 --> 09:59.920]  one is that software efficiency on code level and algorithms, so changing code, you know,
[09:59.920 --> 10:06.440]  matters, and learning how to do it can be, can be useful, and second learning is that
[10:06.440 --> 10:12.680]  there is common pitfall, I think, generally in the, in this years, because in the past
[10:12.680 --> 10:17.320]  we have premature optimizations, everybody was playing with the code and trying to over-optimize
[10:17.320 --> 10:22.440]  things, I think now we are lazy and we are more like into DevOps, into changing, you
[10:22.440 --> 10:27.240]  know, configuration, into horizontal scaling because we have this power, we have cloud,
[10:27.240 --> 10:31.240]  and this is usually, you know, more chosen solution than actually checking the code,
[10:31.240 --> 10:36.640]  right, and I call it closed box thinking, and I think this is a threat a little bit
[10:36.640 --> 10:41.480]  in our ecosystem, so we should acknowledge that there are different levels, we can sometimes
[10:41.480 --> 10:45.600]  scale, we can sometimes put more bigger machine, we can sometimes throw right to rust, if that
[10:45.600 --> 10:50.360]  makes sense, but you know, that's not the first solution that should come to your mind,
[10:50.360 --> 10:51.360]  right?
[10:51.360 --> 10:55.960]  Okay, before we go forward, I will, I have five books to share, and I will start the
[10:55.960 --> 11:00.880]  link to quiz at the end, and it will be super simple, but pay attention, right, because
[11:00.880 --> 11:06.640]  maybe there will be some questions around, and you can answer, send me an email, and
[11:06.640 --> 11:13.640]  I will just, you know, just choose five people, lucky people, to have my book, so, yeah,
[11:13.640 --> 11:23.760]  pay attention, all right, five steps, five steps, yeah, for efficiency, efficiency progress.
[11:23.760 --> 11:28.600]  One thing I want to mention, I don't know if you have been in the previous talk, or like
[11:28.600 --> 11:37.000]  before previous, he kind of explained a lot of optimization ideas, like I think, and I
[11:37.000 --> 11:43.440]  might say before, like he mentioned, string optimizations with internings, has just mentioned,
[11:43.440 --> 11:51.280]  I think, something around, you know, allocations, and many kind of like, I think, padding,
[11:51.280 --> 11:57.120]  strike padding, and generally, you know, all those kind of ideas, this is fine, but it's
[11:57.120 --> 12:01.920]  optimizing stuff, it's not like looking through dictionary of things I did in the past, it's
[12:01.920 --> 12:06.640]  kind of more fuzzy, more involved, so what I would like you to focus, it's not all particular
[12:06.640 --> 12:12.120]  way of how we optimize an example I would show, because it's super simple and trivial,
[12:12.120 --> 12:17.040]  but how we get there, right, how we found what to optimize, how we found if we should
[12:17.040 --> 12:19.800]  even optimize, okay, so focus on that.
[12:19.800 --> 12:24.280]  So first step, first suggestion I would have, and this is from Book, I kind of found, yeah,
[12:24.280 --> 12:30.440]  I don't know, like I defined this name TFBO, which is essentially a flow for development,
[12:30.440 --> 12:34.880]  efficiency aware development that worked for me, and generally I see other professionals
[12:34.880 --> 12:41.280]  doing that a lot as well, so test, fix, benchmark, optimize, so essentially what it is, it's
[12:41.280 --> 12:46.760]  like a TDD with something else, and TDD you are probably familiar with, test-driven development,
[12:46.760 --> 12:52.160]  you test first, as you can see, and only then you kind of like implement or fix it until
[12:52.160 --> 12:57.320]  the test is passing, right, I would like to kind of do the same for optimizations as
[12:57.320 --> 13:03.800]  well, so we have benchmark-driven optimizations, because as you can see, we benchmark first,
[13:03.800 --> 13:09.640]  then we optimize, and then we profile, right, and I will tell you later why, but all of
[13:09.640 --> 13:15.240]  this is a closed loop, right, so after optimizations we have to test as well, okay, so it feels
[13:15.240 --> 13:22.080]  complex, but we'll make one loop, actually maybe two, during the stock on a simple code,
[13:22.080 --> 13:23.360]  so let's do it.
[13:23.360 --> 13:30.440]  So let's introduce a simple function, super simple, super stupid, we are creating millions
[13:30.440 --> 13:34.440]  of elements, I mean, a slice with millions of elements, and each of those elements are
[13:34.440 --> 13:41.800]  just a string, a constant string for them, super simple, it's the first, you know, kind
[13:41.800 --> 13:47.400]  of first iteration of this program we want to write, so what we do regarding TFBO, okay,
[13:47.400 --> 13:52.680]  so we test, right, I mean, now we have a code, for example, and we want to maybe improve
[13:52.680 --> 13:57.160]  it, we test, test-driven development, so let's assume I already had the test, right, but
[13:57.160 --> 14:03.160]  the test could look like this, and then, you know, I'm ensuring, okay, it's passing,
[14:03.160 --> 14:08.760]  so nothing functionally I have to fix, so what next?
[14:08.760 --> 14:12.880]  So next is this measurement, it's a benchmark, and again, has this already mentioned how
[14:12.880 --> 14:19.800]  to make benchmarks, but I have some additions, extensions to that that you might find helpful,
[14:19.800 --> 14:24.880]  something I want to mention is that, you know, we were talking about micro benchmarks, because
[14:24.880 --> 14:30.400]  the same level of testing behavior, like for example, like for this small function, like
[14:30.400 --> 14:36.960]  we have this create, you know, unit test is totally enough, right, this is on micro level,
[14:36.960 --> 14:40.000]  we are making just unit test, it's fine, but sometimes if you have a bigger system, you
[14:40.000 --> 14:45.200]  need to do something on macro level, like integration test, end-to-end test, whatever
[14:45.200 --> 14:49.560]  bigger, right, and the same happens in a benchmark, right, this is micro benchmark, this is kind
[14:49.560 --> 14:57.280]  of unit benchmark, there are also micro benchmarks I covered in my book, and then you need to
[14:57.280 --> 15:02.400]  have more sophisticated kind of setup with low testing, with maybe some automation, with
[15:02.400 --> 15:07.520]  some observability, like, you know, Prometheus, maybe, which measures over time some resources,
[15:07.520 --> 15:12.320]  but here we can, we have a simple unit create function, we can just make it simple with
[15:12.320 --> 15:16.560]  micro benchmarks, and, you know, it has already mentioned, but, you know, there is a special
[15:16.560 --> 15:22.120]  signature in a test file you have to put, and then there are optional helpers, for example,
[15:22.120 --> 15:26.880]  that I like actually to put almost everywhere, report allocs, which is by default making
[15:26.880 --> 15:31.560]  sure that this function will measure allocations as well, and the reset timer, which is super
[15:31.560 --> 15:37.120]  cool because it resets the measurement, so anything before you allocate, you spend time
[15:37.120 --> 15:42.320]  on, it will be discarded from benchmark result, so benchmark will only focus on what will
[15:42.320 --> 15:46.840]  happen within this loop iteration, right?
[15:46.840 --> 15:50.840]  And then this for loop, you cannot change it, don't try to change it, always copy, this
[15:50.840 --> 15:54.240]  is a boilerplate that has to be there, right?
[15:54.240 --> 16:01.960]  Because it allows Go to make repeatable, check the repeatability of your test by running
[16:01.960 --> 16:04.640]  it, you know, hundreds of times.
[16:04.640 --> 16:10.120]  Okay, so how we execute it, already, again, has this mentioned, but this is, you know,
[16:10.120 --> 16:16.240]  how I do it to, like, focus to one test, but this is not enough, in my opinion, right?
[16:16.240 --> 16:20.160]  By default, it runs only one test, one second.
[16:20.160 --> 16:25.920]  I recommend to actually make sure you explicitly state some parameters, right?
[16:25.920 --> 16:33.360]  And I have one liner, one liner in bash, for example, that I often use, so what it is essentially,
[16:33.360 --> 16:38.360]  I'm kind of creating some variables so I can reference this result later on in a short-term
[16:38.360 --> 16:47.120]  future, V1, for example, so this will create a V1.txt file in my locale, it will run this
[16:47.120 --> 16:52.680]  benchmark, it will actually run it, you know, sometime, I specify, again, which is super
[16:52.680 --> 16:57.040]  amazing because it was like, okay, so I have this V1 file, what I was doing with it, and
[16:57.040 --> 17:00.440]  then you check in your bash history, okay, oh, that was one second, and then that was
[17:00.440 --> 17:03.080]  something else, right, so it's kind of useful.
[17:03.080 --> 17:07.720]  And then this is crucial, this is something I don't know why I didn't learn in the beginning,
[17:07.720 --> 17:10.560]  maybe you learned the count, dash count, right?
[17:10.560 --> 17:16.520]  So what it is, is that it runs the same test couple of times, six times, actually, and
[17:16.520 --> 17:21.160]  so one second, six times, and this is super important because then you can use further
[17:21.160 --> 17:27.600]  tools you will see to check, you know, how reliable are your results, it will essentially
[17:27.600 --> 17:32.920]  calculate the variance between the, you know, the timings, for example, so if the variance
[17:32.920 --> 17:37.400]  is too big, then your environment is not stable, right?
[17:37.400 --> 17:43.400]  And then I pin to one CPU, this is super important to, generally pinning, not to one, right?
[17:43.400 --> 17:47.360]  Just pick something that works for you, for concurrency, pick something that runs on production
[17:47.360 --> 17:51.240]  maybe, or similar, but always between tests, don't change that, right?
[17:51.240 --> 17:55.840]  So that's super important, and also I recommend to change less than numbers of CPU because
[17:55.840 --> 17:58.240]  your operating system has to run on something, right?
[17:58.240 --> 18:04.320]  So those things matter, also don't run this on laptop without power connected because
[18:04.320 --> 18:06.160]  you will be CPU trolled off.
[18:06.160 --> 18:09.440]  There are lots of kind of small things that you think, oh, it doesn't matter, no, it matters
[18:09.440 --> 18:11.880]  because then you cannot rely on your results, right?
[18:11.880 --> 18:17.720]  So try to make this serious a little bit and at least, you know, don't put, don't benchmark
[18:17.720 --> 18:20.800]  on your lap, you know, in the bed, you know, because they will be overheating.
[18:20.800 --> 18:23.960]  So yeah, small things, but it matters, right?
[18:23.960 --> 18:27.720]  I was doing that all the time, by the way, yeah.
[18:27.720 --> 18:31.360]  So results, you know, result looks like this.
[18:31.360 --> 18:33.640]  You can see many of them.
[18:33.640 --> 18:37.120]  But this is not how I use it or how we supposed to use it, apparently.
[18:37.120 --> 18:43.960]  There is amazing tool called BenchStat, and it just brings in more human-readable way,
[18:43.960 --> 18:50.120]  and you can see it also aggregates and have some averages over those runs and tells you
[18:50.120 --> 18:51.440]  within this percentage.
[18:51.440 --> 18:57.600]  For example, the time, latency, there is a variance of 1%, which is tolerable, for example,
[18:57.600 --> 18:58.600]  right?
[18:58.600 --> 19:02.800]  And you can kind of like customize what exactly, how it calculates this variance and so on.
[19:02.800 --> 19:08.000]  So we can trust it, like it's within 1% of, I guess, free, you could trust it, depend
[19:08.000 --> 19:11.680]  on what you do, but generally it's not too bad.
[19:11.680 --> 19:14.000]  Allocations, fortunately, are super stable, right?
[19:14.000 --> 19:19.400]  So yeah, so we benchmark, we measure it, okay, we know our function has these numbers, like,
[19:19.400 --> 19:22.120]  I mean, what's next, right?
[19:22.120 --> 19:25.360]  Everybody was like, yeah, let's make it faster, let's make it faster, but wait, wait, wait,
[19:25.360 --> 19:31.880]  why, why should we make it faster, maybe, okay, maybe that's a lot, 100 megabytes of
[19:31.880 --> 19:36.080]  every, you know, create invocation, but maybe that's fine, right?
[19:36.080 --> 19:40.080]  So this is where I think we are missing a lot of experience, usually.
[19:40.080 --> 19:47.080]  I mean, you have to set some expectations, right, like, to what point you are optimizing,
[19:47.080 --> 19:51.280]  and usually we don't have any expectations, like, okay, yeah, I mean, even from product
[19:51.280 --> 19:58.600]  management here we have maybe functional requirements, but never really concrete performance requirements.
[19:58.600 --> 20:02.720]  So we don't know what to do, and honestly, if you don't, you just ignore those requirements,
[20:02.720 --> 20:07.000]  okay, I don't have, I just want to make it faster, then this premature optimization is
[20:07.000 --> 20:12.920]  always, right, because it's always premature, because it's a random, a random goal you don't
[20:12.920 --> 20:17.760]  really understand, right, so maybe, maybe just make it fast, right, that's also like
[20:17.760 --> 20:20.280]  very fuzzy, obviously, and that's not very helpful.
[20:20.280 --> 20:21.760]  So what is helpful?
[20:21.760 --> 20:28.120]  What I will, and I know it's super hard, I know it's kind of uncomfortable, but I suggest
[20:28.120 --> 20:33.880]  doing some kind of efficiency requirements, spec, super simple, as simple as possible,
[20:33.880 --> 20:39.600]  I call it rare, so there are efficiency requirements, and what it means is essentially try to find
[20:39.600 --> 20:44.400]  out some kind of function, right, some kind of, you know, complexity, but not as if it's
[20:44.400 --> 20:50.080]  very complex, it's just more concrete estimation of the complexity based on inputs, right,
[20:50.080 --> 20:55.200]  and for simple functions, like for example, our function, we can estimate, you know, what
[20:55.200 --> 21:01.520]  in our minds we think should happen, roughly, right, so, you know, for runtime, we know
[21:01.520 --> 21:06.760]  we, one million time we do something, we don't know how many now seconds, let's pick 30,
[21:06.760 --> 21:11.400]  this is actually pretty big for one iteration of just append, but just really pick some
[21:11.400 --> 21:15.840]  number, sometimes it's good, it's just, you know, you can iterate over this number, but
[21:15.840 --> 21:20.920]  if you don't know where you go, then, you know, how you can make any decisions, decisions.
[21:20.920 --> 21:28.640]  In allocations, it's a little bit bitter, a little bit easier, because we expect a slice
[21:28.640 --> 21:37.240]  of six, of one million elements of strings, and as we learn from MachiTalk, every string
[21:37.240 --> 21:44.040]  has these two parts, one part has 16 bits, which has length and capacity, or maybe capacity
[21:44.040 --> 21:50.160]  not, but then, yeah, length capacity and pointer, and then there's other parts, which
[21:50.160 --> 21:56.000]  lies in the heap, but for this, you know, 16 bytes, we can assume that we'll be 16,
[21:56.000 --> 22:00.440]  right, so it's every element is 16 bytes, so now we just multiply, that's our function,
[22:00.440 --> 22:05.360]  that's what we all expect, right, and with this, we can, you know, kind of expect that
[22:05.360 --> 22:11.120]  every invocation of create should, you know, kind of allocate 15 megabytes, but what we
[22:11.120 --> 22:16.640]  see, we allocate 80 megabytes, right, so already we see that, oh, there might be like easy
[22:16.640 --> 22:21.960]  ways to do, or something I don't understand about this program, and this is what leads
[22:21.960 --> 22:28.600]  us to better, to spotting maybe easy wins, and spotting, you know, if we need to do anything,
[22:28.600 --> 22:33.800]  right, in terms of time, latency, it's already kind of like, more than we kind of expected,
[22:33.800 --> 22:39.120]  right, but this is more of a guessing, like I just guessed this 30 seconds, right, okay,
[22:39.120 --> 22:47.800]  so what we do, now we know we are, you know, not fast enough, not allocating, we are over
[22:47.800 --> 22:52.600]  allocating, right, so then we profile, then we check, okay, we have a problem, now let's
[22:52.600 --> 22:58.320]  find what's going on, and this is where, on micro level, we can, you know, use profiling
[22:58.320 --> 23:04.000]  very easily by just adding those two flags, it will gather memory profiles and CPU profiles
[23:04.000 --> 23:10.960]  in the file, like v1.mempprof, on macro level, you can, there are other ways of gathering
[23:10.960 --> 23:14.720]  profiles, but you can use the same format, the same tools, there are even continuous
[23:14.720 --> 23:22.080]  profiling tools in open source, like parkadev, I really recommend them, and it's super easy
[23:22.080 --> 23:28.400]  then to gather those profiles over time, so this, what we want to really learn is that
[23:28.400 --> 23:33.360]  what causes this problem, and this is like a CPU profile, and we could spot, and the
[23:33.360 --> 23:40.520]  wider means it spends more CPU cycles, the depth doesn't matter, this is just how many
[23:40.520 --> 23:46.320]  functions we have, right, so we can see that create, of course, is one of the biggest contributors,
[23:46.320 --> 23:53.440]  but the growth slice, right, like why we spend so many cycles growing slice, ideally, I know
[23:53.440 --> 23:59.120]  how many elements I have, kind of why it doesn't grow me once, right, and then we can check,
[23:59.120 --> 24:04.800]  and by the way, you can use this go tool, pprof.gttp, locally, I kind of use it a lot
[24:04.800 --> 24:09.560]  on this file to kind of expose this kind of interactive UI, you can do the same for memory,
[24:09.560 --> 24:14.240]  but honestly, this is not useful because Append is a standard library function, and they are
[24:14.240 --> 24:18.840]  not very well exposed, right, so they're hidden, so this is not very helpful, actually CPU
[24:18.840 --> 24:23.160]  profile was more helpful, because it pointed us to the growth slice, and if we just Google
[24:23.160 --> 24:27.840]  for that, you will notice this comes from Append, and then you can go to documentation
[24:27.840 --> 24:32.000]  of Append and learn what it actually does, and as you probably are familiar, because
[24:32.000 --> 24:39.000]  this is like, should be a trivial case, Append resizes the slice, or assizes the underlying
[24:39.000 --> 24:45.960]  array, whenever it's full, right, and resizing, it's not super simple, it has to kind of create
[24:45.960 --> 24:53.760]  a new, bigger array, and copy things over, and garbage collection will kill the old one,
[24:53.760 --> 24:58.960]  but not fast enough because of the garbage collection, so we kind of aggregate that as
[24:58.960 --> 25:08.960]  another allocation, right, so this is what happens, and kind of the fix is to just preallocate
[25:08.960 --> 25:13.640]  right, so to tell, you know, when you create the slice, okay, how much capacity you want
[25:13.640 --> 25:18.240]  to prepare for that, and thanks for that, so what we do now, okay, we did optimize in
[25:18.240 --> 25:24.640]  our TFBO, now we test, before we're even measuring, because if you are not testing
[25:24.640 --> 25:30.360]  if this, you know, this code is correct, then, you know, you might be, you know, yeah, we
[25:30.360 --> 25:33.920]  would be happy that things are faster, but functionally broken, so always test, don't
[25:33.920 --> 25:39.160]  be, you know, lazy, run those unit tests, easy, and then, you know, once they are passing,
[25:39.160 --> 25:44.840]  you can comfortably measure, again, I just changed V2, just to specify another variable,
[25:44.840 --> 25:51.440]  right, on our file system, and then I can do a bunch that V1.txt and then V2.txt, actually,
[25:51.440 --> 25:56.360]  I can put like 100 of those variables, it will compare all of them, but here we compare
[25:56.360 --> 26:02.240]  two, and not only we have absolute values of those measurements, but also a diff, right,
[26:02.240 --> 26:08.280]  so you can see we improved a lot, and if we check absolute value in regards to our efficiency
[26:08.280 --> 26:14.720]  requirements, you see that we met our threshold roughly, but like we estimated it, so it's
[26:14.720 --> 26:20.000]  totally good, you know, 15 megabytes, we have 15 megabytes, and then it's faster than our
[26:20.000 --> 26:25.680]  goal, so now we are good to go and release it, right, so that's kind of the whole loop,
[26:25.680 --> 26:31.760]  and you kind of do it until you're happy with your results, so yeah, this is it, and learnings,
[26:31.760 --> 26:37.560]  again, five learnings, follow TFBO, test, fix, benchmark, optimize, use benchmarks, they
[26:37.560 --> 26:43.520]  are built into GoLang, they are super amazing, GoTest slash bench, set the clear goals, goals
[26:43.520 --> 26:49.440]  are super important here, right, and then profile, and you can, I mean, GoLang uses
[26:49.440 --> 26:55.040]  Pprof, which you can Google as well, it's like amazing kind of protocol, kind of set
[26:55.040 --> 27:00.440]  of tools, integrated with other, you know, clouds and so on, and use it, you know, every
[27:00.440 --> 27:06.320]  day whenever I have to optimize something, and then finally, the key is to try to understand
[27:06.320 --> 27:11.160]  what happens, what I expected, and you know, what's wrong, reading documentation, reading
[27:11.160 --> 27:16.320]  code, this is what you have to do sometimes, and a general tip, whenever you want to optimize
[27:16.320 --> 27:22.720]  something super, super carefully in some, you know, bottleneck part of your code, I
[27:22.720 --> 27:27.360]  mean, avoid standard library functions, because they are really built into generic functionality,
[27:27.360 --> 27:30.960]  it will test, I mean, it will do a lot of things with, you know, different edge cases
[27:30.960 --> 27:36.200]  that you might not have, so a lot of times, I just implemented my own parsing integer
[27:36.200 --> 27:40.320]  function, it was much faster, so this is a general tip that always works, but again,
[27:40.320 --> 27:45.040]  do it only when you need it, because you might have a box in this code, right?
[27:45.040 --> 27:48.640]  So that's it, thank you, you have a link here, bwplotka.dev.
[27:48.640 --> 27:58.640]  Thank you.