Hello. Can you hear me all right? Welcome to the next installment in your regular scheduled entertainment. I think that's the great benefit of being scheduled after such great talks that everyone is in and with no break you can't leave. So you're kind of stuck with me for the next half an hour. So welcome. I hope you enjoy. So this is a very ambitious talk or at least as title suggests that it is. So let me actually start with talking what it really is about. And this talk is really about me and my experience. And hope, frustrations, aspirations, experience and some illustrations. So I'm Alex. I have been doing web performance, mobile web performance at Google in Chromium for the last eight or so years. So this talk is going to be pretty much about that. This is not going to be a practical talk. So because we don't have time. This is the first reason. The second is that there are way too many rough edges. So I wouldn't recommend this at this point to start to reproduce this. But hopefully this will be a source of inspiration. And those of you who are desperate enough and frustrated enough and have seen the problems outlined in this talk too many times, hopefully they will be brave enough to venture and try. There is a practical guide that I would recommend in the recent web performance calendar by the one and only great Annie Sullivan. I would recommend you go and check it out, preferably after this talk. But you know, I can't prohibit you from doing so. So this talk really is about problem solving and working with complex systems and trying to make sense of them. So the examples will be from Chromium. I will talk about Perfetto. But I think these examples hopefully will be an inspiration for a great variety of projects, both when building your sites and building other and working with other complex systems as well. So let's talk about performance and improvement performance. If you want to improve performance, as I imagine you are not totally adverse to, given that you are in this room, then I want to remind certain trivial things that you probably already know. The first is that performance problems are nasty. They are unpolite and they don't have the common courtesy of locating themselves in a nice isolated area of code in your project that you can master and just work on and don't bother with the rest of the stuff. So because of that, kind of knowing what to improve and where to improve and how to improve takes a substantial effort of the performance work. And the fact that more and more web is mind boggling complex with new APIs, both performance and non-performance, browsers getting more complex and bigger every day, various sites, drawing in diversity and in complexity and libraries and so on. So these all leads to all of us working on performance, spending a lot of time on a regular basis trying to understand what the hell is going on here. And what are the approaches? So the first approach that I have to mention is you can go and read the code. You are a very brave person and I wish you the very best of luck if you decide to do it, but it's not very practical. So modern projects are layers of up and layers of obstructions and then you have a listener and then you have 30 possible callbacks or entry points and then good luck. Usually, I give up at this point when I see like, hey, no, this is probably one of these 30 things. The second one is printf and it's possible variations. So it's console, it's log, it's other, just log it statements. And the second is the buggers. So GDB, LLDB, RR, Chrome DevTools, some of them are better than the others. But all of them, these approaches effectively don't scale to complex systems, especially if you talk about indeterminism when you test sometimes reproduces and the error sometimes reproduces sometimes doesn't then you are in a bit of fun. So when you have multiple processes and multiple architectures, multiple architecture components, then all of these, you know, these tools don't work particularly well. So they focus on low level details. Hey, what is this variable? And most often you want to know, hey, what this component is doing? And am I doing a good job? So enter tracing. How many of you are familiar with tracing in some form or the other? Some of them. So pretty much tracing is structured logging and visualization. I will go into this a little bit more further down the line. But as far as chromium is concerned, from the practical perspective, it means turning these annotations. So here we have a request resource from Java function that is being annotated with tracement macro. So we in C++, we emit some information when we enter this function. And we emit some information that we exit this function when tracing is enabled. And this will allow us to look at this nice timeline. Pretty much the x axis is time advancing in time. And here you can see that we have entered this function here, you have exited this function. And you can see which other functions were called inside of it, how long it took. And you can see zooming out what else the system has been doing across different threads across different processes, which is I think a good starting point and the basic infrastructure talk about. So if you wanted to use it to trade yourself, you can actually go to your IPF at a depth. And the examples in this talk are pretty much all from open Chrome example. So if you have a laptop, then you can go and follow it. Then the links to the slides should be on the FOSDEM site for this talk. But we'll back talking about how to make this useful. So you have this wonderful instrumentation. And this already is, you can use it as a fancy fprintf with search functionality. You can just record a lot of information and then look at it. But this is basically instrumenting the code you're already working on as a fancy fprintf. It's powerful and flexible, but not necessarily most convenient. And it doesn't win either compared to fprintf or debuggers out of the box. For fprintf, the basic debug loop is still faster. You had a single statement, you don't have to bother with opening anything anywhere. You just see the console output and you're done. So it gets less pleasant when you have to do it multiple times. And with debuggers, every all information is present. You can take you a bit of time to find it, but you don't have to bother with adding more annotations, recompiling and wasting time there. So like, and it's unrealistic to have all of the functions instrumented and captured in this race, because it's too much information both to record, which adds all of overhead and slow downs, but also it's a lot of information to go through and looking at it is not pleasant and not fast. So I will talk about finding opportunities for scaling this instrumentation and finding the opportunities where a few instrumentation points can give us a lot of information and substantially advance our ability to reason about what the code is doing. And enter Chrome task schedule. Chromium is implemented based on an event loop model. So we have a bunch of name threads. We have browser process with a browser main thread where which is responsible for coordinating everything. We have the render process with the main thread, which is responsible for running JavaScript, bling DOM and whatever. Or we have worker pulls, sorry, dedicated workers, which sites can create using new worker API. There is a thread pool for miscellaneous background work. And these is pretty much all there is and various place in the code. In the code base, I think we have, you know, a few thousand places. So maybe 10,000 nowadays, which basically post tasks. They get a task runner from somewhere and they post a task. No, the from here macro will talk about this in a second. But otherwise it's just a fancy lambda with some of safety thrown in. And here it is. So you post this task somewhere, you know, some thread or thread pool picks this up and it will run this task. Voila. And this is a great point for tracing instrumentation. And this is a great point to start looking at. So what it gives us, it gives us that we will have pretty much all of the places running Chromium and code. We will know about them and we will have some basic information. Here specifically look at posted from information. This is the result of from here macro expansion, which using some of the C macro tricks, give automatically without any further support, gives us file name on the function. So at least for every function, you have a basic idea of where this information, where this task has been posted from. And you can go to that part of the code base and start understanding what the hell is going on and why this task might be running. So then we can zoom out and we also have instrumentation for post task. And the post task and run tasks conveniently are linked through a flow event. And what this actually means that you, instead of looking at a single task, which might or might not be useful, you can also explore which tasks this came from and which tasks the task it came from came from here because of, I can't really zoom out and I can't really make it interactive. So I can't show the entire to all threads involved. So this is a view of a single thread. But hey, you can see I selected a single task. I can see an incoming flow that is coming from a thread pool. And as you can see that that thread pool task is coming from another task from the main thread. And so you can see that actually all of these smaller tasks running after a larger task, they have been posted from it. And these are, we know, these are related. So this is pretty much a very good starting point, but it doesn't give us everything. So there are a few other chalk points that might be useful to instrument and that have been useful to instrument that can, that improve our ability to reason about what is going on. Task scheduling is inherently inter-intro process. So it doesn't tell us about inter-pros communication, but fortunately in Chromium we have Mojo, which is an IPC subsystem, which we can also instrument and get pretty much the same information. But for cross-pros communication, we can know who is sending messages and we can connect the place which posted the message and places that received the message and to be able to trace this back through the flow. Capturing console logs and DV logs and debug logs and the both logs is also not a great source of information. If someone bothered to log it somewhere in the system, that's probably already useful for us. And being able to correlate this additional source with data with actual tasks that Chromium is using have been proven useful in many investigations. Capturing, instrumenting all of the blink binings and pretty much capturing all of the JavaScript functions, JavaScript calls that end up being implemented in blink is another great way to reason about what is going on and what the website is doing on and a couple of other similar infrastructure pieces. So the key takeaway here is that, hey, if you have complex systems, then probably you would do some good to instrument some of the widely used things and if you are familiar with this codebase, you will be able to make some informed judgment of what is going on and you will be able to spot outliers, something taking too long, log being held in case of performance regression or a functional regression or a flaky test, etc. And that's already a great step forward. So you have, you can look at it, you can like, if your test is flaky, you can run a thousand times, it will fail five times, you can open five phrases, look and see if you're lucky enough, you will be able to spot noticeable difference. But this is still not good enough for me. And the problem is that, despite having visibility into everything we're doing, this is very, very, very expertise intensive. So in order to be able to make good use of it, you have to kind of know everything. You have to know a lot about Chromium architecture. So as some of my colleagues say, you have to have a PhD in tracing and Chromium architecture to truly make this useful. And I have an inspiration of, hey, let's get it to the point that anyone, so any web developer can open and trace and instead of being discouraged and being intimidated by all of this mumbo jumbo, they can learn something about how Chromium actually works and get more knowledge about this. So an inspiration that I have is this slide and this diagram from a life of a delegation, talk from Chromium University, that is kind of similar to what we have already seen. It's a kind of a virtual timeline with a kind of boxers being connected by arrows. But if you look at it, then even if you are not deeply familiar with the browser architecture, then you probably kind of make some sense and you can make some educated guesses of what is going on. For example, if you see network stack doing start URL request as a one off stage, it's something that you can develop or get a reasonably good intuition for. And that's kind of the status quo that we currently have, which is pretty much exactly the same information, but slightly less useful, slightly less easier to read and slightly more intimidating. So for example, you can see tasks, you can see that hey, some of them are related to URL load the client, so you're getting information from the network. Someone, a navigation client, which kind of you know the navigation stack, you kind of guess what it is, but the level of intuitiveness is starkly different. So there are existing examples where we already do this in Chromium and we like take the care to reconstruct the high level events and the high level timeline for specific things. For example, this is an example of event latency. So specifically breaking down the timeline of steps and sequence of steps involved in presenting a frame. We're doing great on time. So the downside is that it's plumbing is very expensive and scaling this up is very difficult. When you have a big project, you have information, you need information from different you know corners of this project and plumbing is very expensive, both in terms of serialization costs, in terms of layering concerns, in terms of the amount of plumbing code that you need to maintain. And this you know difficult to scale and you know we haven't implemented this for too many exciting things. So let me talk about Perfetto a little bit. So Perfetto is the new generation tracing framework born from the ashes of Chromium tracing by a few great folks who have been working on Chromium tracing, got fed up with it, learned all of the mistakes that happened there and all of the things that we should shouldn't have done in the first place. And Voila Perfetto, which is nowadays widely used for Chromium and Android tracing. So it has fancy new UI, it has more efficient format, but the thing that brings a special place in my heart for it is the new SQL data model and query engine. So essentially everything that you can see in the UI is backed by a data model and UI is just running queries in the data model, against this data model and presenting it. And presenting it and you can very easily do it yourself. And you know we trace processor actually is compiled as a was module and running in your browser in a background thread. Voila web, we have gone, we've came very far. And this allows us to separate recording the trace and emitting the low level instrumentation and actually analyzing it and building high level data models. So this is probably the best example of Perfetto powers. I could hit it in a single slide. You can replicate this yourself if you go to Perfetto, if you go to open the Chrome trace, if you type colon into the search box, you will enter the SQL query mode and then you can copy and paste the query that I inserted there. Once again, you should have access to the slides and then it will pretty much give you the list of top 100 longest tasks that we ended up running there, which is already useful for analysis and can allow you to build more and more complex data models through different tables within SQL, which is kind of cool. So what are the next steps here? So I am right now trying to build in a navigation instrumentation, fancy navigation instrumentation as a proof of concept. The current prototype is kind of there, so you can see that we have a timeline. This is all pretty much based on the same low level information, but presents it in a more fancy version. And this then can be further integrated with the documentation. So this is just a not standalone box with a couple of words scribble on it, but we can also link to parts of Chromium documentation that outline what this stage is actually about, what are the concepts that you need to think about and make it generally more useful. One of the major complexities, why we haven't done this before, is that the complexity in a number of corner cases. When you talk about navigation, when you talk about typing the URL into the OmniBox, there are like a mind boggling complex number of cases from redirects to navigation, turning it to downloads to server returning to O4 and canceling the navigation that you kind of need to think about. And building this instrumentation without being able to test it is kind of a losing game. And the Sequel support actually allows us to feasibly write this testing coverage for these corner cases. I think I'm 15 out of 50 at this point, so some work to do. So yeah, I think that's all of the main content that I have. I have a bonus demo, which is kind of about DevTools, but I can also take questions. Eyeballing is, so the question is what's the best way of comparing these traces? I think eyeballing is probably a good place to start. So there are some early experiments of opening the traces times, say to some it and being able to link the timelines, but this is greatly depends on what kind of problem you're looking at. For example, if you are comparing the traces from tests, then the workload is more repeatable and you can actually go further in comparing it. For example, writing some Sequel queries and instructing some high-level metrics can get you very, very far and spotting if any high-level metrics changed or any derived things changed. If it's user's interactive, then probably eyeballing and going from there and seeing how much variance there is. Yes. Yes, some Sequel statement there. Yes, great question. The question is we have Sequel, but where is the database? The answer is it's all done locally. So this is Sequelite compiled into a WOS module with some helpers on top. So when you're opening a trace, it's running in your background thread in Sequelite instance. Everything is local. More questions? If not, I can actually go and show you my favorite House Party trick, which is an illustration of why it's actually quite important, I think, to think about data presentation. Sorry? More questions? No. Let me try to do this. Can you see what's going on? So let me open a trace in the performance in ground deftos that I have recorded earlier this morning. And this is something that you should be already familiar with, but the thing that some of you might not have realized, that there is nothing inherently magical or special about these deftal traces, apart from very good UI and a lot of UX thoughts that went into that. But fundamentally, they are just JSON-chrome traces, just with a bunch of categories. And you can actually open the very same information in Perfetto and actually look at it. And you already can see that the usefulness of this information is a bit different. We have to zoom and find our relevant parts, and we have been exposed with a low-level information, but no high-level insights. But then we have the network tracing. And the best way to illustrate that, further, is look at one of the network requests. Let me... Not this one. I want to find a network request from the deftools with the URL. And you can see that it... Some high-level stats, and you can see where it fits with other stuff. Psyllium shots also help. But then I can search by this URL, and I can also find the request ID and find all of the events, the low-level events that Chrome is tracing has actually meted. So all of the information about this network request is there. So if you can be bothered, you can actually go and correlate and go to all of these specific events and correlate them and reconstruct the same level, high-level takeaways. But it's going to be a little bit slower, a little bit less useful, and you won't actually be using it that much yourself, probably. So, yeah.