[00:00.000 --> 00:15.200]  Yeah. Hi, everyone. My name is Christian Simon, and I'm going to be talking about continuous
[00:15.200 --> 00:23.360]  profiling. So we heard a lot about observability today already, and I'm going to want to introduce
[00:23.360 --> 00:30.720]  maybe an additional signal there. So maybe quickly about me. So I'm working at Grafana
[00:30.720 --> 00:37.240]  Labs. I'm a software engineer there, and worked on our databases for observability. So I worked
[00:37.240 --> 00:45.320]  on Cortex slash Mamiir now. I worked on Loki, and most recently I switched to the flat team,
[00:45.320 --> 00:53.680]  and I'm 50% of the flat team, and we kind of work on continuous profiling database. There's
[00:53.680 --> 00:59.440]  not going to be a particular focus on flat today. So basically what I want to talk today
[00:59.440 --> 01:04.920]  is kind of introduce how it's measured, what you can achieve with it, and then maybe as
[01:04.920 --> 01:12.800]  I learn more, at the next first time I can go more into detail and look at very specific
[01:12.800 --> 01:22.600]  languages there. So when we talk about observability, what are our common goals? Obviously we want
[01:22.600 --> 01:28.680]  to ensure that the user journeys of our users are successful, that we maybe even can be
[01:28.680 --> 01:40.720]  proactively find problems before a user notices it. And basically we want to be as quickly
[01:40.720 --> 01:47.080]  as possible when those problems happen to disrupt less of those user journeys. And observability
[01:47.080 --> 01:52.840]  provides us like an objective way of getting some insights into the state of our system
[01:52.840 --> 02:01.320]  in production. And even after a certain event has happened, we found the right way, reboot,
[02:01.320 --> 02:05.360]  and it's all up again. I think it might be able to help us understanding what exactly
[02:05.360 --> 02:13.480]  happened when we want to figure out the root cause of something. So one of the I guess
[02:13.480 --> 02:21.520]  easiest and probably oldest observability signals is logs. So like I guess it starts
[02:21.520 --> 02:27.040]  with a kind of print hello somewhere in your code. And I guess you probably don't need
[02:27.040 --> 02:32.840]  a show of hands who's using it. Like I guess everyone somehow uses logs or is asleep if
[02:32.840 --> 02:40.320]  they don't show a hand. So basically your application doesn't need any specific SDK.
[02:40.320 --> 02:47.520]  It can probably just log based on the standard library of your languages. One of the challenges,
[02:47.520 --> 02:54.120]  most of the time the format is rather varied. So like even in terms of timestamps, it can
[02:54.120 --> 03:01.920]  be really, really hard to get a common understanding of your log lines there. And also like when
[03:01.920 --> 03:05.920]  you then want to aggregate them, they are quite costly. So like it spides, you need
[03:05.920 --> 03:13.080]  to kind of convert them to in floats and so on. And also that something that can happen
[03:13.080 --> 03:18.000]  is that like you have so many logs that you can't find the ones that you're actually interested
[03:18.000 --> 03:25.880]  in. So like grab error, for example, could be, yeah, could be maybe helpful, but also
[03:25.880 --> 03:33.320]  like there might be just too many errors. And you kind of lose the important ones. So
[03:33.320 --> 03:38.560]  also like if you want to learn more about logs, my colleagues, Oven and Kavi, they're
[03:38.560 --> 03:44.160]  going to speak about Loki. So definitely stay in the room. And I'm going to move on
[03:44.160 --> 03:49.120]  to the next signal. So metrics, I'm also assuming pretty much everyone has come across them
[03:49.120 --> 03:55.040]  and is using them. So in this case, you kind of avoid that problem I mentioned before.
[03:55.040 --> 03:59.840]  You have the actual integers exposed. You maybe know about those integers that you care
[03:59.840 --> 04:07.240]  about them. So to get a metric, most of the time you have to do some kind of define a
[04:07.240 --> 04:12.600]  list of metrics you care about and then you can collect them. So it might be, you might
[04:12.600 --> 04:16.600]  be having kind of an outage and didn't have that metric that you care about. And so you
[04:16.600 --> 04:22.280]  need to kind of constantly improve on the exposure of the metrics. Obviously, like Peruvius
[04:22.280 --> 04:31.080]  is the kind of main tool in that space. And very often we talk about web services, I guess,
[04:31.080 --> 04:36.520]  when we think about those applications. So the red method, so like get the rates of your
[04:36.520 --> 04:43.680]  requests, get the error rate of your request, and get the latency duration of the request
[04:43.680 --> 04:49.320]  can already cover quite a lot of cases. And obviously, like as it's kind of just integers
[04:49.320 --> 04:56.560]  or floats, you can aggregate them quite efficiently across like, yeah, a multitude of pods or
[04:56.560 --> 05:06.440]  like a really huge set up of services. And then if you get more into that kind of microservices
[05:06.440 --> 05:11.400]  architecture that has kind of evolved over the last couple of years, you will find yourself
[05:11.400 --> 05:18.680]  kind of having a really complex composition of services being involved in answering requests.
[05:18.680 --> 05:23.440]  And so like, you might even struggle to understand what's slowing you down or where the error
[05:23.440 --> 05:29.720]  is coming from, why do I have this time out here? And distributed tracing can help you
[05:29.720 --> 05:34.360]  a lot with kind of getting an understanding what your service is doing. It also might
[05:34.360 --> 05:38.800]  be that like, maybe the service is actually doing way too much and you're calculating
[05:38.800 --> 05:45.320]  things over and over again. So that is super helpful to get a bit more like the kind of
[05:45.320 --> 05:53.040]  flow of the data in your system. So like the challenge there might be like, you might have
[05:53.040 --> 05:59.520]  a lot of requests and that's, while it's somewhat cheap to get the tracing, you might not cover
[05:59.520 --> 06:04.400]  all the requests. So for example, now production system, I think maybe someone needs to correct
[06:04.400 --> 06:11.880]  me if I'm wrong, but like when we receive a Grafana cloud like logs and metric data,
[06:11.880 --> 06:21.480]  we only cover 1% of those with traces while we cover 100% of our queries. So like, basically
[06:21.480 --> 06:26.920]  you need to make a selective decision if it's worth investing that. Obviously logs data
[06:26.920 --> 06:32.720]  looks always the same, it comes every second and so on. So like, we see more value in having
[06:32.720 --> 06:38.720]  all of those queries where there's a complex kind of caching and all sorts of systems interacting
[06:38.720 --> 06:46.640]  with it and so that allows us, yeah, to look a bit deeper and even then find that one service
[06:46.640 --> 06:53.040]  that maybe is the bottleneck there. So maybe looking at a bit of a real problem, so I'm
[06:53.040 --> 06:59.800]  having an online shop, I'm selling socks and a user is complaining about getting some time
[06:59.800 --> 07:06.680]  out when wanting to check out. That's obviously not great because I'm not selling the socks,
[07:06.680 --> 07:12.800]  but at least the user got some trace ID and is complaining to our customer service. Then
[07:12.800 --> 07:18.840]  starting from there, I'm figuring out it's the location service that actually was the
[07:18.840 --> 07:24.880]  one that cost the time out in the end. And then looking at the metrics of the location
[07:24.880 --> 07:30.520]  service, I might find, oh, there's actually 5% of the requests timing out, so maybe 5%
[07:30.520 --> 07:38.440]  of my users are not able to buy their socks monthly or whatever. So what are the next steps?
[07:38.440 --> 07:44.920]  I guess scaling up is always good. Maybe the service is just overloaded. The person that
[07:44.920 --> 07:50.080]  wrote it left years ago, so we have no idea. So we just scale it up. Obviously, it comes
[07:50.080 --> 07:57.520]  with a cost and so I think always the first thing would be fixing the bleed, making sure
[07:57.520 --> 08:01.600]  there are no more timeouts. So scaling up is definitely the right option here. But then
[08:01.600 --> 08:06.520]  if you do that over years, you might suddenly find yourself having a lot of extra costs
[08:06.520 --> 08:13.120]  attached to that location service. And so that's kind of where I think we need another
[08:13.120 --> 08:19.160]  signal. And I think that signal should be profiling. So I guess most people might have
[08:19.160 --> 08:29.880]  come across profiling. And it basically measures your code and how long it executes, for example,
[08:29.880 --> 08:37.760]  or what kind of bytes it allocates in memory. And it basically helps you maybe understand
[08:37.760 --> 08:45.440]  your program even more or someone else's program in the location service case. And that eventually
[08:45.440 --> 08:50.600]  can translate in cost savings if you find out where the problem lies, like you can maybe
[08:50.600 --> 08:56.240]  fix it or can get some ideas. Maybe you can also look at the fix and see if it's gotten
[08:56.240 --> 09:04.720]  actually worse or better. And yeah, like that basically gives you a better understanding
[09:04.720 --> 09:13.760]  of how your code behaves. And so now the question is, what is actually measured in a profile?
[09:13.760 --> 09:20.200]  So I created a bit of a program. I don't know. I hope everyone can see it. So it's basically
[09:20.200 --> 09:24.680]  like a program that has a main function and then calls out other functions. So you can
[09:24.680 --> 09:31.560]  see there's a do a lot and there's a do little function. And both of them then call prepare.
[09:31.560 --> 09:35.120]  And obviously in the comments, there's some work going on. And obviously the work could
[09:35.120 --> 09:43.720]  be allocating memory, like using the CPU, something like that. So let's say we use CPU.
[09:43.720 --> 09:51.280]  So when the function starts, like we are first going to do something within the main, let's
[09:51.280 --> 09:56.240]  say we spend three CPU cycles, which is not a lot, but that then gets recorded like yes,
[09:56.240 --> 10:03.840]  we took us three CPU cycles in main. We then go into the prepare method through do a lot.
[10:03.840 --> 10:08.840]  Then we spend another five CPU cycles. And those kind of stack traces then they are recorded
[10:08.840 --> 10:16.520]  in the profile. And going through the program, like we will end up with that kind of measurement
[10:16.520 --> 10:24.400]  of stack traces. And while it kind of works with ten lines of codes, you can maybe spot
[10:24.400 --> 10:28.800]  where the problem is. Like there's the 20 and do a lot. It definitely kind of breaks
[10:28.800 --> 10:35.520]  down when you're speaking about like a lot of services or like a lot of code base that
[10:35.520 --> 10:41.120]  is happened or happens to be hot and actively used. And so like there are a couple of ways
[10:41.120 --> 10:47.960]  of visualizing them. I think one of the first things you would find is kind of a top table.
[10:47.960 --> 10:56.360]  So like in that table, you can order it by different values. Like so this is kind of
[10:56.360 --> 11:01.760]  an example from P prof, like which is kind of the go tool. And you can see kind of clearly
[11:01.760 --> 11:10.400]  do a lot is the method that comes out on top. And like there are now different ways how
[11:10.400 --> 11:15.160]  you can look at the value. So you have the flat count, which is the function itself only.
[11:15.160 --> 11:20.280]  So you can see the 20 that we had before, 20 CPU cycles. But we also have the cumulative
[11:20.280 --> 11:27.280]  measurement, which also includes the prepare that is going to get called from do a lot.
[11:27.280 --> 11:35.040]  And so like we already can see we spend 52% of our program and do a lot. So maybe we already
[11:35.040 --> 11:39.000]  can stop looking at the table and just look at do a lot. Because if we fix do a lot or
[11:39.000 --> 11:45.400]  get rid of it, we need half of the CPU less. And that's that kind of, it's kind of represented
[11:45.400 --> 11:49.400]  by the sum. So the sum will always change depending on how you order the table in that
[11:49.400 --> 11:56.400]  particular example. So in this case, like we have 100% already at row number four, because
[11:56.400 --> 12:04.760]  we only have four functions. And to get a bit more of a visual sense of what's going
[12:04.760 --> 12:11.080]  on, there are the so-called flame graphs. And I think the most confusing thing for me
[12:11.080 --> 12:16.960]  about them was the coloring. So obviously like red is always not great. Should we look
[12:16.960 --> 12:25.920]  at main? No, we shouldn't. So basically like the coloring I think is random or uses some
[12:25.920 --> 12:30.840]  kind of hashing. And basically it's only meant to look like a flame. So like the red here
[12:30.840 --> 12:38.120]  doesn't mean anything. So like if you're colorblind, that's perfect for flame graphs. So what we
[12:38.120 --> 12:45.040]  actually want to look at is this kind of like at the leaf end is basically where the program
[12:45.040 --> 12:48.880]  would spend things. Like you can see the three CPU cycles main here. So there's nothing
[12:48.880 --> 12:56.560]  below. So main uses 100% through methods that are called by main. And then there's nothing
[12:56.560 --> 13:02.560]  beyond this here. So like here we spend something in main. And in the same way in do little,
[13:02.560 --> 13:08.160]  we can see the five. And in do a lot, we can see the 20 quite wide. And then the prepares
[13:08.160 --> 13:15.160]  with five each as well. And now obviously if you look across a really huge program, you
[13:15.160 --> 13:21.000]  basically like can can spot kind of what's going on quite quickly. And then if you have
[13:21.000 --> 13:25.880]  like similar with like route in main, like you basically can ignore that, but it helps
[13:25.880 --> 13:30.800]  you maybe locate which component of your of your program you want to look at because
[13:30.800 --> 13:36.360]  like maybe you're not good at naming and you call everything prepare in util and and it
[13:36.360 --> 13:45.560]  would still tell you roughly where it gets called and and how the program gets there.
[13:45.560 --> 13:51.280]  So how do we get that profile? And that can be kind of quite that can be quite a lot of
[13:51.280 --> 13:58.280]  challenges how to get that. So I think I would say like there's like roughly two ways like
[13:58.280 --> 14:05.180]  either your ecosystem supports kind of profiles fairly natively. And then you instrument the
[14:05.180 --> 14:13.720]  profile, you added maybe a library and SDK. And like basically like the runtime within
[14:13.720 --> 14:22.360]  your environment will maybe expose the information. So like it's not available for all languages.
[14:22.360 --> 14:30.440]  There's I guess a lot of work going on that it becomes more and more available. And kind
[14:30.440 --> 14:35.560]  of other approaches more like through an agent and EPPF has been quite hyped. I'm I'm not
[14:35.560 --> 14:41.200]  very familiar with the EPPF myself. I have used the agents but haven't written any code.
[14:41.200 --> 14:44.960]  But basically what it would use it would use an outside view of it. So you wouldn't need
[14:44.960 --> 14:51.280]  to change the binary running really like you would just kind of look at the information
[14:51.280 --> 14:58.000]  you get from the Linux kernel like you hook into, I don't know, often enough when the
[14:58.000 --> 15:06.000]  CPU runs to then find out what is currently running. So there are different languages
[15:06.000 --> 15:12.480]  like for example in a compiled language. You would be having a lot more information. The
[15:12.480 --> 15:18.840]  memory addresses are the same. You can kind of use the information within the simple table
[15:18.840 --> 15:25.640]  to figure out where your program is and what is currently running. In like I don't know
[15:25.640 --> 15:30.280]  like an interpreted language like Ruby, Python, this might be a bit harder and that information
[15:30.280 --> 15:36.320]  might not be accessible to the kernel without further work. Like it also when you compile
[15:36.320 --> 15:43.240]  you might drop those simple tables like so that you really need to kind of be preparing
[15:43.240 --> 15:56.760]  your application a bit for that. I want to look into the kind of prime example I'm most
[15:56.760 --> 16:04.920]  familiar with. I'm mostly a go developer over the last couple of years. Go has quite a kind
[16:04.920 --> 16:11.640]  of mature set of tools in that area. So basically the standard library allows you to expose
[16:11.640 --> 16:18.040]  that information. It supports like CPU and memory. And especially garbage collected
[16:18.040 --> 16:24.840]  languages. Memory is quite a thing to also non-garbage collected languages. But memory
[16:24.840 --> 16:33.480]  is really important to understand the usage there. I have a quick example of a program
[16:33.480 --> 16:37.720]  like so you basically like just expose an HTTP port where you can download the profile
[16:37.720 --> 16:45.200]  whenever you want and you have that P prof tool that you can point to it. So like in that
[16:45.200 --> 16:50.480]  kind of first line example you would just get a two second long profile. So CPU profile
[16:50.480 --> 16:56.960]  that looks at the CPU for two seconds and basically records like whatever is running
[16:56.960 --> 17:02.360]  how long on the CPU and then you get the file and P prof will allow you to visualize it through
[17:02.360 --> 17:09.640]  that top table for example. So what I forgot to mention as well. So later you can maybe
[17:09.640 --> 17:18.680]  go to that URL and look at that profile that I had as an example. And in the same way you
[17:18.680 --> 17:25.120]  can get the memory allocations and P prof also allows you to launch an HTTP server to
[17:25.120 --> 17:32.640]  be a bit more interactive and select certain code paths. So that is like quite a lot in
[17:32.640 --> 17:41.720]  the go docs, go.dev about profiling. So I definitely leave it there. So you can also
[17:41.720 --> 17:47.360]  look at kind of maybe if you are a go developer like use that and play around yourself. But
[17:47.360 --> 17:54.000]  now I want to speak about why profiling might be actually quite difficult. So the example
[17:54.000 --> 18:00.880]  I had like I had three CPU cycles and if you think about that is not very much. So and
[18:00.880 --> 18:05.480]  just to record what the program was doing in those three CPU cycle probably takes I
[18:05.480 --> 18:11.120]  have no idea about thousands of CPU cycles. And so you really want to be careful what
[18:11.120 --> 18:17.640]  you want to record. So if you really would record all of that like your program would
[18:17.640 --> 18:23.520]  probably have like a massive overhead would slow down by profiling behave completely different
[18:23.520 --> 18:29.320]  and you also would have a lot more data to store to analyze. And then if you think about
[18:29.320 --> 18:37.160]  micro services and replica count 500 you might get quite a lot of data that is actually
[18:37.160 --> 18:45.480]  not that useful to you because are you really caring about three CPU cycles? Probably not.
[18:45.480 --> 18:51.440]  And because of that to allow continuous profiling so to do that in production across like a
[18:51.440 --> 19:00.160]  wide set of deployments like I think Google were the first ones to do that and they were
[19:00.160 --> 19:06.080]  starting to sample those profiling. So instead of looking at really every code that runs
[19:06.080 --> 19:13.040]  go for example looks 100 times a second what is currently running on the CPU and then records
[19:13.040 --> 19:20.720]  it and obviously maybe like integer adder will not be on the CPU if you don't run it
[19:20.720 --> 19:26.920]  all the time and so you get a really accurate representation what is really taking your
[19:26.920 --> 19:32.280]  CPU time. And the way that works you also need to be kind of aware that like some things
[19:32.280 --> 19:38.040]  the actual program might not be on the CPU because it might be waiting for IO and so
[19:38.040 --> 19:43.200]  like when you kind of collect a profile and the profile is not having that many seconds
[19:43.200 --> 19:48.160]  you really need to think about is this really what I want to optimize or maybe I'm not seeing
[19:48.160 --> 19:58.880]  what I actually want to see. With that kind of statistical approach like I don't have
[19:58.880 --> 20:04.240]  any kind of sources to say but like I think generally you say that it's like a two to
[20:04.240 --> 20:09.160]  three percent overhead that gets added on top of your program execution so that's I guess
[20:09.160 --> 20:17.760]  a lot more reasonable than the full approach with recording everything. And so what do
[20:17.760 --> 20:21.280]  you generally kind of would do obviously if you first need to ship your application somewhere
[20:21.280 --> 20:28.680]  and run it then you can look at the profiles and yeah think about it look at it like maybe
[20:28.680 --> 20:34.200]  you are the owner of that code maybe you have a bit more understanding and those profiles
[20:34.200 --> 20:39.040]  maybe can give you a bit more of an idea of what you're actually doing there or how the
[20:39.040 --> 20:46.320]  system is reacting to your code. And so basically like for that green box multiple solutions
[20:46.320 --> 20:51.680]  exist so I'm obviously a bit biased but I also have to say our project is fairly young
[20:51.680 --> 21:01.560]  and evolving. So for example there's like CNCF Pixie, EBPF based, there's Polar Signals
[21:01.560 --> 21:08.720]  Parker like people are in the room, Pyroscope and kind of our solution. I think they're
[21:08.720 --> 21:14.560]  all great like you can all use them and start using them and exploring like maybe just your
[21:14.560 --> 21:19.840]  benchmarks for a start and then as you get more familiar with it like you might kind
[21:19.840 --> 21:28.320]  of discover more and more of the value there. So I'm still going to use Flare now for my
[21:28.320 --> 21:44.160]  quick demo. So let me just see. So I guess most of you are kind of familiar with Grafana
[21:44.160 --> 21:57.800]  and Explore. Why is it so huge? And so basically that's kind of the entry point you're going
[21:57.800 --> 22:04.800]  to see in the Explore. You have the kind of typical time range selection so let's say
[22:04.800 --> 22:09.240]  we want to see the last 15 minutes now and here we can see the profile types collected.
[22:09.240 --> 22:14.240]  So that's just a Docker compose running locally on my laptop, hopefully running locally on
[22:14.240 --> 22:21.200]  my laptop since I started to talk. And for example we can here look at the CPU. So that's
[22:21.200 --> 22:29.600]  kind of the nanoseconds band on the CPU and you can kind of see the flame graph from earlier
[22:29.600 --> 22:35.160]  and maybe some bug. I don't know. It looks a bit bigger than it usually should be. But
[22:35.160 --> 22:39.480]  we can see that kind of top table. We can see the aggregation of all of the services.
[22:39.480 --> 22:44.680]  So I'm running like five pods or something like that, different languages. So you can
[22:44.680 --> 22:52.200]  see like for example this here is like a Python main module where it's doing some prime numbers.
[22:52.200 --> 22:58.000]  So what I first want to kind of break down here is by label. And that's really the only
[22:58.000 --> 23:01.800]  kind of functionality that we have in terms of querying. So here we would look at the
[23:01.800 --> 23:11.280]  different instances and we kind of see the CPU time spent like, I don't know, there's
[23:11.280 --> 23:15.280]  like a Rust port and they are both blue so I don't know which switch, but I guess Flare
[23:15.280 --> 23:23.960]  is doing more. So that might be the Flare one. And for my example now I want to look
[23:23.960 --> 23:34.080]  at just like a small program that I wrote to show like how like the aspect. So like here
[23:34.080 --> 23:38.760]  we can see kind of the timeline. So this is like a profile gets collected I think every
[23:38.760 --> 23:44.680]  15 seconds and that's basically a dot. And then the flame graph and the top table below
[23:44.680 --> 23:48.400]  would kind of just aggregate that. So like there's no time component in here. That's
[23:48.400 --> 23:56.920]  also important to understand. And so like while we were looking at memory I'm now going
[23:56.920 --> 24:08.600]  to kind of switch to the allocated space. And oh no. And here we have some label selection
[24:08.600 --> 24:15.480]  like that you might be familiar. And this random port here you can see like the allocation
[24:15.480 --> 24:22.400]  so the amount of memory that gets allocated is like around six megabytes. But then every
[24:22.400 --> 24:30.920]  couple of every five minutes roughly you can see like some peak. And so if you already
[24:30.920 --> 24:40.120]  look in the flame graph there's already some kind of big red box and the colors don't matter.
[24:40.120 --> 24:45.000]  But basically like you can see this this kind of piece of code is doing kind of a majority
[24:45.000 --> 24:49.520]  of the allocations. And now you could even kind of zoom in here if you really want to
[24:49.520 --> 24:56.600]  figure out and then it even gets bigger and you can see some more what's going on. And
[24:56.600 --> 25:12.760]  so now if you actually want to look at the kind of code for this. And if flare is maybe
[25:12.760 --> 25:16.720]  in version 0.6 we could even see the line of code that we should look at for now you
[25:16.720 --> 25:24.560]  can. But basically like it would show us allocations in line 21. And I guess most of you can see
[25:24.560 --> 25:27.960]  what this kind of program is doing so every five minutes it will kind of have some peak
[25:27.960 --> 25:34.400]  of allocations. And you only see that kind of because you have the time component you
[25:34.400 --> 25:43.840]  can select and then see the flame graph aggregation. Cool. Yeah that was almost my talk. Like I
[25:43.840 --> 25:49.320]  have one more slide that I should just quickly want to. So in the latest version 120 there
[25:49.320 --> 25:55.160]  is profile guided optimizations and I think that might be a really big topic. So what
[25:55.160 --> 25:59.840]  it does it looks at kind of your profile and that can come from production from benchmarks
[25:59.840 --> 26:07.200]  from whatever and tries to do the better decisions during compile time of what things to do with
[26:07.200 --> 26:11.360]  your code like for example think the only thing that it does right now is making inlining
[26:11.360 --> 26:17.160]  decisions. But basically like if it sees this method is called a lot and is in the hot path
[26:17.160 --> 26:21.840]  it would then make the decisions to inline the method maybe if it's a bit colder it would
[26:21.840 --> 26:27.560]  not do it and you can be a lot more accurate as a compiler if you have that kind of data
[26:27.560 --> 26:40.920]  if you know that method is in the hot path or not. Okay that was it. Thank you.
[26:40.920 --> 26:52.040]  Thanks a lot that was awesome. Questions. Thank you. Thank you for the talk. I'm just
[26:52.040 --> 26:57.520]  wondering how would the profiling work with very multi-threaded code. Is there ability
[26:57.520 --> 27:03.480]  to drill down into that level. Yeah so like maybe so like in terms of multi-threading
[27:03.480 --> 27:08.400]  like obviously we only have the main method in that example. So you can see rude and then
[27:08.400 --> 27:13.960]  mine is 100% and like if it's multi-threaded you would have kind of maybe more so it's
[27:13.960 --> 27:19.160]  basically all only the stack trace that gets recorded like you would not see kind of maybe
[27:19.160 --> 27:23.480]  the connections where the thread where it's threading off and things like that. You would
[27:23.480 --> 27:31.440]  get the stack trace. Cold stack. Have you looked into any other profiling formats than
[27:31.440 --> 27:38.800]  B prof ingestion. I know open telemetry has been doing some stuff about introducing a profiling
[27:38.800 --> 27:44.440]  format that people can standardize on but I don't know if you've looked at that at all.
[27:44.440 --> 27:54.280]  Yes. Can you like I haven't seen you like sorry can you repeat like I struggled to.
[27:54.280 --> 27:58.320]  So I was wondering if you've looked at any other profiling ingestion formats other than
[27:58.320 --> 28:05.720]  B prof. No I like I or so like right now we use P prof personally with the player. So
[28:05.720 --> 28:11.880]  I think there's a lot of kind of improvements to be had over over the format there and that's
[28:11.880 --> 28:18.160]  like as far as I know some active work around open telemetry to to come to I guess a better
[28:18.160 --> 28:23.120]  format in the sense to not send symbols over and over again and reduce interest but not
[28:23.120 --> 28:32.800]  no it's the the accurate and short answer. Okay so thank you for the talk and my question
[28:32.800 --> 28:36.560]  is that looking at the flare architecture it's currently pool models so the flare agent
[28:36.560 --> 28:40.720]  is scraping the profiling data from the applications that they configure it to scrape. My question
[28:40.720 --> 28:45.840]  is is there an eventual plan to also add maybe a push gateway or similar API for applications
[28:45.840 --> 28:51.800]  where this might be suitable. Yeah like definitely like I think I also can see kind of the push
[28:51.800 --> 28:57.400]  use case for maybe if you want to get your micro benchmarks from CI CD in so like the
[28:57.400 --> 29:03.920]  API in theory allows it but tooling is missing but I definitely think it's a valid like push
[29:03.920 --> 29:14.320]  use case as well. I think in terms of scalability I think pool will be better but yeah I agree.
[29:14.320 --> 29:24.040]  Thanks for the talk. I have a small question. Did you try to implement this tooling in the
[29:24.040 --> 29:32.720]  end of the CI CD and CO continuous optimization? No like so we're not using it yet for that.
[29:32.720 --> 29:38.800]  I think it's it's definitely a super useful thing because like yeah like you want to see
[29:38.800 --> 29:42.720]  maybe how a pool request behaves like maybe how your application allocates more or less
[29:42.720 --> 29:47.640]  in different parts and and if the trade-offs are right there but yeah I think it definitely
[29:47.640 --> 30:04.080]  can and should be used for that but no no tooling right now. Yeah no I fully agree as
[30:04.080 --> 30:12.400]  well yeah. Hello thank you. So if I understand correctly profiles such as traces combined
[30:12.400 --> 30:19.960]  with OS metrics right so at a concrete specific time you can see how much CPU you used and
[30:19.960 --> 30:26.880]  so on right. Yeah I guess it looks a bit more at the actual line of code rather than I don't
[30:26.880 --> 30:30.880]  know like I don't know I haven't used like tracing where it automatically finds the function
[30:30.880 --> 30:38.560]  maybe that also tells you the line of code but yeah like it definitely adds some metrics
[30:38.560 --> 30:43.440]  to it like without you doing much I guess other than making sure it can read the symbol
[30:43.440 --> 30:47.960]  tables and the function names. Yeah so so I just had like a dumb question or like dumb
[30:47.960 --> 30:54.440]  idea couldn't you just combine for example you already have node exporter which exposes
[30:54.440 --> 31:02.120]  metrics at all times so you have OS metrics and you have traces for example so couldn't
[31:02.120 --> 31:07.000]  you just have some kind of integration graph on or or somewhere else that just combines
[31:07.000 --> 31:10.760]  traces with metrics. Yeah so like I think that was also the like like people that work
[31:10.760 --> 31:15.480]  longer at continuous profiling software that they try to kind of reuse kind of Prometheus
[31:15.480 --> 31:21.400]  and I think where you end up kind of in it's just a very high cardinality it's too many
[31:21.400 --> 31:26.040]  lines of codes and and that's kind of where it stops but like in theory like I guess most
[31:26.040 --> 31:32.400]  promql constructs and functions are maybe something we need to implement on top of that
[31:32.400 --> 31:39.000]  in a similar way because in the end you just get metrics out of it and so basically the
[31:39.000 --> 31:43.760]  problem was too many lines of code too much changing over time and like you just get too
[31:43.760 --> 32:02.680]  much serious turn through that. So thanks a lot. Yeah thank you for coming.