[00:00.000 --> 00:14.120] Hi, how are you doing? Welcome to FOSDOM. Congratulations on managing to get inside a room. This is the [00:14.120 --> 00:18.920] largest one I've ever seen. Usually it's just looking at the doors of ones that are full. [00:18.920 --> 00:26.520] So yeah, my name's Daniel Stone. I'm here to just give a relatively high level overview [00:26.520 --> 00:34.200] of the graphic stack. My hope with this, like I said, it's fairly high level, is to give you a [00:34.200 --> 00:40.440] decent understanding of all the different components that go into the modern graphic stack, [00:40.440 --> 00:48.360] how they fit together. So if you're trying to work with it anyway, you won't be trying to [00:48.360 --> 00:55.800] debug it because it's already perfect, but just being able to give you a good understanding [00:55.800 --> 01:02.280] of how everything does fit together. And now we have graphics output working, so that's a good [01:02.280 --> 01:11.040] start for this talk because that wasn't looking likely five minutes ago. Right, so the graphic [01:11.040 --> 01:22.760] stack looks like this. Any questions? That's the simplified version as well. More sensibly, [01:22.760 --> 01:31.440] if we try to build it up incrementally, just try and work through all of the different pieces [01:31.440 --> 01:41.920] and different components in essentially the order of near to far, which is, you know, [01:41.920 --> 01:47.680] in networking you think of upstream and downstream, usually in the graphics for the lot of what we [01:47.680 --> 01:54.960] think of is what's close to your eye and what's far from your eye. So in our case, the display [01:54.960 --> 02:04.040] is closest to your eyes, and this one's incredibly bright. In between, just underneath the display, [02:04.040 --> 02:12.760] controlling the display and giving you determining what should be shown, we have the window system [02:12.760 --> 02:21.920] layer, so that's your Wayland. It can be X11, but we don't talk about that. And then at the very [02:21.920 --> 02:28.080] back end, at the upstream side, you've got the clients, which are actually presenting the thing [02:28.080 --> 02:37.880] that you want to show. But then it turns out that your window system also uses the GPU to render, [02:37.880 --> 02:46.280] so it's not just OpenGL games that use accelerated graphics. It's the window system, so the nice [02:46.280 --> 02:53.960] diagram already gets a bit muddied because we're breaking the layers. And then maybe the window [02:53.960 --> 03:02.920] system uses some media output because you want to stream stuff onto it or, you know, to stream a [03:02.920 --> 03:14.200] conference talk, hello. And maybe one of your clients is also a window system because it turns [03:14.200 --> 03:23.160] out that even Chrome is a Wayland server these days, so our lovely little, we have three classes [03:23.160 --> 03:33.320] of three main components of our graphic stack. This illusion's already disappeared. But, you know, [03:33.320 --> 03:45.600] let's pretend that everything is fine and let's just try to build it up. So for us, DOM and KMS, [03:45.600 --> 03:54.360] the acronyms you mostly see, the direct rendering manager is anything to do with graphics or display [03:54.360 --> 04:03.520] inside the kernel. It's a weird legacy name. And those are all of the GPU and display drivers. [04:03.520 --> 04:12.480] And KMS is very specifically the part of DRM that actually controls the display. So when you're [04:12.480 --> 04:22.120] talking about HDMI output or something like that, then it's going to be KMS. And KMS is that very [04:22.120 --> 04:28.880] last step in the pipeline, the one that's closest to your eye. Its job is to turn pixels into light. [04:28.880 --> 04:37.600] Some people will tell you that there's a thing called FB Dev as well, but that's not right. FB Dev [04:37.600 --> 04:49.240] doesn't exist. And, yeah, in the division of responsibility as we go one step further back [04:49.240 --> 04:57.000] from your eye, the Windows system's job is to fundamentally to take a bunch of images from [04:57.000 --> 05:02.720] clients, combine them into a single image or multiple images if you have multiple displays, [05:02.720 --> 05:11.440] get them out to the eye and bring input events back. So, you know, Wayland is a protocol and [05:11.440 --> 05:20.760] nothing else. There's a very, there's a very small C layer in Wayland, which is really just IPC. [05:20.760 --> 05:28.480] And apart from that, it's just protocols and conventions. So, you know, MATA, the GNOME users [05:28.480 --> 05:36.040] is a Wayland server. Other popular ones would be KWIN, Western, WL routes. That's where all the [05:36.040 --> 05:44.120] implementation actually lies. And, yeah, like I say, they just combine window images together, [05:44.120 --> 05:49.200] get them out to the output device in the reverse direction they're bringing input back. [05:49.200 --> 06:03.800] X11 doesn't exist either. So, that's, we'll move on. Yeah. So, OpenGL and Vulkan, in a way they fit [06:03.800 --> 06:12.600] in. Their APIs, as we know for accelerated 3D, so you provide them a mesh and some textures and [06:12.600 --> 06:20.680] some shaders. Run this thing, make it fast. Great. But they only handle rendering. So, [06:20.680 --> 06:30.240] GL and Vulkan themselves have no concept of I want to be able to display to Wayland. That comes in [06:30.240 --> 06:40.080] with EGL and what we call the Vulkan WSI for window system integration layer. Their job is to [06:40.080 --> 06:48.000] bridge the two worlds. So, with OpenGL, you have EGL on the side that's the bridge between GL and [06:48.000 --> 06:57.720] say Wayland. With Vulkan, you have Core Vulkan and then the WSI on the side is that bridge [06:57.720 --> 07:08.160] bringing all the content across to the window system. And then there's GBM as well, which is [07:08.160 --> 07:19.000] maybe the most ill-fitting part of what we have. GBM is kind of a side channel to bridge EGL to [07:19.000 --> 07:30.240] KMS. So, right now, I mean, this is all happening through GNOME shell and MOTA. It's using GL to [07:30.240 --> 07:38.920] render my image with the next slide as a bonus preview and this one that you can see. MOTA, [07:38.920 --> 07:49.040] yeah, it uses GL to render and it uses EGL plus GBM to be able to pull images out to [07:49.040 --> 08:01.000] kernel mode setting. And GBM is a really, really strange and idiosyncratic bridge. Some people [08:01.000 --> 08:07.600] will tell you that GBM stands for the generic buffer manager. That's definitely not true. [08:07.600 --> 08:17.200] Yeah, we had an idea that GBM would be the thing that let people kind of peek under the hood of [08:17.200 --> 08:25.400] what EGL does as an implementation and be able to generically allocate buffers. We got as far as [08:25.400 --> 08:31.480] making it work for kernel mode setting and then realized how terrible the whole problem space was. [08:31.480 --> 08:37.680] So, we just pretended that it was never an acronym, that it's not generic and moved on with our [08:37.680 --> 08:49.000] words. So, at the end of all that, before we get into something more meaty, we've got clients [08:49.000 --> 08:54.960] rendering the content, maybe with the GPU, maybe just on the CPU, maybe it's just doing mem copy. [08:54.960 --> 09:02.760] It will pass a handle to that content over to the Wayland compositor with some metadata, [09:02.760 --> 09:11.520] some context. The compositor is going to pull it all together, choose how it's going to display it, [09:11.520 --> 09:19.800] apply any kind of policy or what have you. And then it's going to just push that final image [09:19.800 --> 09:29.280] out to KMS, which is going to turn it into electrons. So, we've got the diagram that's back [09:29.280 --> 09:41.680] to making sense. So, if we're looking at how KMS is actually put together, every single discrete [09:41.680 --> 09:47.600] device in your system is its own. I just have an Intel up top here. I have one DRM device, [09:47.600 --> 09:55.800] which is the entire Intel GPU and display complex. If you're on ARM systems usually, [09:55.800 --> 10:03.000] you're going to have two devices. The display and GPU are separate IP blocks from separate [10:03.000 --> 10:10.920] vendors who aren't really on speaking terms. So, you'll have one DRM device for your display [10:10.920 --> 10:18.120] controller and another DRM device for your GPU and they're completely separate. So, yeah, [10:18.120 --> 10:28.360] four KMS devices. We've got connectors representing real displays. So, we've got an embedded [10:28.360 --> 10:35.320] display port connector here and various display ports and HDMI connectors from my external outputs. [10:35.320 --> 10:45.360] CRTCs, that does stand for CRT controller because that's how long ago it was when we [10:45.360 --> 10:57.480] designed all this. CRTCs are the thing immediately upstream from connectors. They generate a pixel [10:57.480 --> 11:07.200] stream for the displays. So, any kind of scaling, cropping, compositing is done in the CRTC space. [11:07.200 --> 11:20.520] And CRTCs are just a combination of planes. So, planes, they take frame buffers. They [11:20.520 --> 11:28.160] can scale. They can be positioned within the CRTC. They can be stacked. And then the CRTC [11:28.160 --> 11:34.760] is the one that combines them. So, in quite a poor diagram, because for a graphics person, [11:34.760 --> 11:42.800] I can't actually draw very well, more of a text person, to be honest. Yeah, it's the [11:42.800 --> 11:50.200] frame buffer is just the client content. The plane is the one that's going to do any format [11:50.200 --> 11:57.800] conversion or scaling or what have you. Then the CRTC combines them all together, pushes [11:57.800 --> 12:04.160] them out to the connector. Then I think the important thing to bear in mind if you're [12:04.160 --> 12:11.160] trying to reason about graphics pipelines is that timing flows backwards. Timing never [12:11.160 --> 12:18.520] flows forwards. Because when you've got a physical display, it's going to refresh at [12:18.520 --> 12:24.360] a certain point in time. Unless it's VRR, no one asked about VRR. We don't quite know [12:24.360 --> 12:35.120] how that works yet. But timing flows backwards because this HDMI output is ticking at 60 [12:35.120 --> 12:40.880] hertz. That's happening at a very, very fixed point in time. And so that's the beginning [12:40.880 --> 12:48.360] of our reference. When we know that we want to present stuff to HDMI, we know exactly [12:48.360 --> 12:54.720] when the next refresh cycle is going to start, the next one after that, so on and so forth. [12:54.720 --> 13:01.760] So timing is always flowing backwards. This goes right the whole way from the connector [13:01.760 --> 13:09.480] back to the CRTC, back to the Windows system, and then back to the clients. It's always [13:09.480 --> 13:18.480] starting from that fixed hardware source. So yeah, you want to use DRM and KMS. Good [13:18.480 --> 13:27.680] for you. I'd recommend it. It's just a set of objects, like everything that turns out [13:27.680 --> 13:35.800] in computer science. It's objects with properties, and that's it. So you open your KMS device, [13:35.800 --> 13:42.640] you enumerate a list of objects, your CRTCs, your connectors, your planes, you look into [13:42.640 --> 13:51.680] their properties. So this connector type is DisplayPort, this one's HDMI, whatever. And [13:51.680 --> 13:59.040] then any time you want to actually affect something, so display new content, change [13:59.040 --> 14:05.720] resolution, whatever, that's all done through what we call Atomic Mode Setting, which is [14:05.720 --> 14:13.920] about 10 years old now, and it's a very low-level property-based interface. I wouldn't really [14:13.920 --> 14:23.360] recommend trying to drive it yourself, but it is possible. So Atomic is just a list of [14:23.360 --> 14:30.920] properties. So you've got all of your different objects and their different types. You know [14:30.920 --> 14:37.520] how you want to put them together. You know that I want this plane to go to this CRTC, [14:37.520 --> 14:45.640] to this connector, and so you take all of those objects, you do a massive property set, [14:45.640 --> 14:52.280] and then you do an atomic check before you commit just to see if the configuration is [14:52.280 --> 14:59.960] going to be accepted. One of the things about display hardware is that it's weird. It's [14:59.960 --> 15:07.160] really, really weird. There are infinite constraints on what you can actually do with the display [15:07.160 --> 15:12.880] hardware. So you might have three or four planes that you can use to composite content [15:12.880 --> 15:20.520] without using the GPU, but you can only use a couple of them at a time, or only one of [15:20.520 --> 15:28.960] them can have compressed content, or only two of them can be scaled. So because we don't [15:28.960 --> 15:36.120] have a good generic way of expressing these constraints and of constraint solving within [15:36.120 --> 15:43.920] the kernel, we do the dumbest possible thing. It's brute force. We just try every possible [15:43.920 --> 15:50.920] configuration that will get us to where we want to and see which one's going to stick. [15:50.920 --> 15:57.760] Then yeah, once you've gone through all that, you've done your atomic commit, you've got [15:57.760 --> 16:06.160] a frame on screen, it lives there until you change it. Because DRM is, it's a frame by [16:06.160 --> 16:16.960] frame API. It's not a producer-consumer where you connect a camera to an output and magic [16:16.960 --> 16:23.560] things occur and you get a video stream. You know, that's the domain of high-level frameworks [16:23.560 --> 16:32.680] like say PipeWire and Gstreamer have that pipeline concept. DRM is quite dumb. It just [16:32.680 --> 16:38.840] does what you tell it to, and it doesn't do anything else until you tell it to do something [16:38.840 --> 16:50.600] else. So yeah, we've essentially summing up, you know, we've enumerated all of our devices, [16:50.600 --> 16:56.920] we've used the DRM to do that, all of the objects. And again, as with timing, we're [16:56.920 --> 17:05.040] working backwards from the starting point of a connector. So we know that HDMI1 is the [17:05.040 --> 17:10.720] thing that we want to light up, so you always work backwards from that when you're building [17:10.720 --> 17:21.200] up your object tree. And then, you know, you are going to need a way to allocate some memory [17:21.200 --> 17:31.080] to display. It's not just a malloc pointer. So we have Gem, the graphics execution manager. [17:31.080 --> 17:38.120] It doesn't manage execution of any graphics jobs, it's just a memory allocator. This was [17:38.120 --> 17:43.840] about the point where we stopped actually naming acronyms because we've got almost all [17:43.840 --> 17:53.560] of them wrong. So Gem, you see a lot of, because that's the base of our kernel allocator for [17:53.560 --> 18:01.600] all graphics and display memory. And BO is something you see a lot of as well. So really, [18:01.600 --> 18:10.880] I told you about it, acronyms. So Gem BO is just, like a malloc pointer, it's untyped, [18:10.880 --> 18:19.360] it's a raw bucket of bytes. It can be pixel buffers, it can be shaders, it can be geometry [18:19.360 --> 18:26.160] meshes, whatever you want it to be. It doesn't have any properties or metadata, just a length [18:26.160 --> 18:35.560] and some content. But you can't allocate them generically because hardware is really that [18:35.560 --> 18:41.360] weird. We gave up on that a long time ago. So you're going to need some kind of hardware [18:41.360 --> 18:49.960] specific API to come up with a Gem BO. And you might be quite disappointed about that, [18:49.960 --> 19:01.520] which is reasonable. So we came up with dumb buffers as a specific class of Gem BOs designed [19:01.520 --> 19:07.760] specifically for CPU rendering when you're displaying KMS. So if you have something like [19:07.760 --> 19:15.080] Plymouth for your early start splash screen, that's not going to be using the GPU. It's [19:15.080 --> 19:22.280] just going to be doing CPU rendering, no device dependent code. And dumb buffers are the path [19:22.280 --> 19:28.640] to that there. I just wanted to get something up on the screen. I don't care if it's amazingly [19:28.640 --> 19:35.600] fast or efficient, I just need it to work and work everywhere. So this is actually a [19:35.600 --> 19:41.320] generic API inside KMS dumb buffers. Gives you a Gem BO, you can map it, you can fill [19:41.320 --> 19:48.080] it up with some nice pixels. And then wrap that in a KMS frame buffer is what annotates [19:48.080 --> 19:54.360] the BO with stuff like format and width and height and stuff that people think might be [19:54.360 --> 20:00.720] important. So yeah, like I said, you can use it for splash screens. Please don't try to [20:00.720 --> 20:07.360] use it for other stuff. It's not a generic memory allocation API either. It's just the [20:07.360 --> 20:15.680] thing that works. So yeah, with all that being said, that's a reasonable end-to-end picture [20:15.680 --> 20:22.960] of how to use KMS. You've allocated all the buffers you need or imported them from other [20:22.960 --> 20:29.040] clients. You've attached those frame buffers to planes. You've stuck them on a CRTC to [20:29.040 --> 20:35.880] get them in a kind of logical space and stacked against each other. You've set your CRTC [20:35.880 --> 20:43.480] and connector up for the output path. Commit everything. Hopefully that works. Then the [20:43.480 --> 20:48.480] kernel tells you that it's complete. You know when the next frame is going to be and you [20:48.480 --> 20:55.800] just keep on going. You can't click these links if you're sitting in this room, but [20:55.800 --> 21:02.840] they are clickable on the PDF. There's a bunch of pretty decent documentation examples and [21:02.840 --> 21:07.720] formats because I'm not trying to show you the entire thing. Just give you a good idea [21:07.720 --> 21:19.240] and some pointers. If you're bored of KMS or you just don't find display that exciting, [21:19.240 --> 21:26.520] you might want to move on to the Windows system world. There's a super quick one through Wayland. [21:26.520 --> 21:35.560] Again, it's the same thing. It's clients giving you images and you're giving clients pointer [21:35.560 --> 21:43.200] and keyboard and top screen events in return. I think the main thing about Wayland that [21:43.200 --> 21:50.600] people take a while to grasp is that it's descriptive rather than prescriptive. What [21:50.600 --> 21:59.000] I mean by that is in X11, when you have a pop-up, you tell X as a client, put this window [21:59.000 --> 22:04.840] exactly here on the screen. Give me all of the input events until I tell you otherwise [22:04.840 --> 22:11.080] because you're dictating specific outcomes. Wayland is exactly the other direction from [22:11.080 --> 22:17.560] that. The client tells the compositor, this is a pop-up. The compositor does the right [22:17.560 --> 22:24.680] thing for pop-ups, including capturing input and making it always be on top, but still [22:24.680 --> 22:33.640] letting your screensaver work, which is nice. It's just about the client annotating everything [22:33.640 --> 22:41.480] it has with a bunch of descriptive information and properties and then relying on the server [22:41.480 --> 22:48.480] to actually implement the right semantics. There's a fair bit of trust, but it gives [22:48.480 --> 22:55.280] us much, much more flexibility because by the end after how many years of X11, we were [22:55.280 --> 23:03.600] kind of painted into a corner really because clients were just dictating so much. [23:03.600 --> 23:10.000] We tried to make sure that there were no pods in Wayland that required the compositor to [23:10.000 --> 23:14.680] do a huge amount of work because it's such a critical part of the stack that you can't [23:14.680 --> 23:24.040] have it burning loads and loads of time. Like I said at the start, your compositor could [23:24.040 --> 23:32.760] be GNOME, K-Win, could be Western, Sway or something like that. They're all designed [23:32.760 --> 23:39.360] for different things and different use cases like window managers in X11 were. I think [23:39.360 --> 23:45.520] Western is the best one because I work on it. It's basically designed for everything [23:45.520 --> 23:52.440] that isn't a desktop, literally planes, trains and automobiles, digital signage, that kind [23:52.440 --> 23:59.560] of thing. It's really, really efficient and predictable and reliable, but I do use a desktop [23:59.560 --> 24:07.000] so I have GNOME on this one. There are absolutely a pile of them to choose from, but they all [24:07.000 --> 24:15.680] use the same protocol so they all look alike to the client. [24:15.680 --> 24:23.040] It's just a large collection of essentially all extension interfaces. WLBuffer is much [24:23.040 --> 24:29.600] like a frame buffer to handle to some pixels somewhere, no other information just width [24:29.600 --> 24:36.640] and height. A WL surface is a window, can be a pop-up, can be an application window, [24:36.640 --> 24:45.360] can be a subsurface. It takes the buffer, it just crops it and optionally it takes input [24:45.360 --> 24:54.960] back. XDG surface is the main one you'd interact with really because that's what adds all the [24:54.960 --> 25:04.160] desktop-like things of being able to resize and move windows and all that kind of thing. [25:04.160 --> 25:11.680] WLC is where the input comes from because we're still bad at naming, it turns out. That [25:11.680 --> 25:18.960] one was my fault actually. We did design Wailand fundamentally to be really, really easy [25:18.960 --> 25:27.400] to extend so there are quite a pile of extensions that you need to sort through and deal with. [25:27.400 --> 25:36.280] The nice thing is with it having been designed with KMS in mind, it's pretty similar. You've [25:36.280 --> 25:42.920] got your compositor doing the final output at the end and that's composed of a bunch [25:42.920 --> 25:51.120] of windows and surfaces which have got buffers attached to them. The compositor is the ultimate [25:51.120 --> 26:03.160] source of the timing and it flows that timing back to the clients as feedback. If you take [26:03.160 --> 26:10.880] that, it looks exactly the same as the KMS diagram we had earlier which is not really [26:10.880 --> 26:19.160] any coincidence and using that is exactly the same flow as KMS. This slide was almost [26:19.160 --> 26:28.360] copy and paste. Again, I'm not trying to give you a complete guide to how to write every [26:28.360 --> 26:35.080] Wailand client in the world. Please do use a tool kit. They will make your lives much [26:35.080 --> 26:46.200] easier so GTK, QT, STL, IMGUI, whatever. Use a compositor tool kit as well if you like. [26:46.200 --> 26:51.640] Libwestern in particular and WL routes are tool kits you can use to build compositors [26:51.640 --> 26:59.960] on top of good code bases. There's some links in here as well to Wailand info is a good [26:59.960 --> 27:12.080] tool to inspect. WL hacks is a debugging tool. Western debug is another debugging tool. There's [27:12.080 --> 27:19.200] some sample clients as well. The simple SHM and simple EGL are our kind of references [27:19.200 --> 27:30.360] of how do I actually start using this and start approaching it. Now we've got all that [27:30.360 --> 27:40.240] out of the way. I'm not going to try and explain GL to you because we'd be here forever. Like [27:40.240 --> 27:50.520] I said, it's GL as a model for accelerated 3D is clients providing the vertex data so [27:50.520 --> 27:57.400] you're kind of wire frame geometry, your input textures, material images, and your shader [27:57.400 --> 28:05.080] programs as well to run to generate the final output. No shaders can deform the geometry [28:05.080 --> 28:14.160] so you can do cool stuff. You can also do things like lighting per pixel and do that [28:14.160 --> 28:24.520] in a nice reflective way that's all computational. I guess the main thing to recognize about GPU [28:24.520 --> 28:36.160] is they're enormously parallel so thousands of threads, really. There's not much in the [28:36.160 --> 28:45.200] way of synchronization or shared memory. They really, GPUs can't do branching like CPUs. [28:45.200 --> 28:51.640] They want to have everything set up for them a long time in advance and just do straight [28:51.640 --> 29:00.320] line things from there. It's a long, deep pipeline essentially and you want to make [29:00.320 --> 29:08.600] that roughly as static as you can. The cost of being enormously fast and really, really [29:08.600 --> 29:15.680] powerful, it turns out, is that they're really power hungry. That's why we have composition [29:15.680 --> 29:23.040] in the display hardware as well because it turns out that just spinning up your GPU once [29:23.040 --> 29:33.040] per frame to produce the final display output. I worked on a device where the video runtime [29:33.040 --> 29:41.880] went from five hours if we didn't use the GPU to four hours if we did. It's a really [29:41.880 --> 29:48.000] measurable cost to get a GPU involved. You only want to do it if you've got the right [29:48.000 --> 30:02.480] reasons for it or if you actually need it. Like I said, it's just a pure 3D only API [30:02.480 --> 30:10.960] when you talk about GL and GLES because it came out of SGI where you told it to draw [30:10.960 --> 30:17.000] and it was drawing because there's only one screen and obviously it's going to come out [30:17.000 --> 30:26.440] at the right place in the screen as a simpler time. Then SGI realized that they needed some [30:26.440 --> 30:35.440] more nuance. They brought in GLX, which was the first go at integrating OpenGL with the [30:35.440 --> 30:44.680] Windows system. Originally it had the X server processing all the commands. That was terrible. [30:44.680 --> 30:51.440] We came up with the DRI for direct rendering infrastructure, not let the clients directly [30:51.440 --> 31:02.560] access the GPU. It relied on central memory allocation. We came up with DRI2 where the [31:02.560 --> 31:10.920] main innovation was that clients would manage their own memory in cooperation with the kernel [31:10.920 --> 31:19.240] and also execute all of their own commands. That was so good that any time you see DRI [31:19.240 --> 31:26.920] it just means accelerated rendering, so roughly describing the last 20 years. Any time you [31:26.920 --> 31:36.720] see DRI2 it doesn't mean actual DRI2 in X11. It just means this kind of looks like a modern [31:36.720 --> 31:45.600] Windows system by which I mean about the last 15 years. That can be confusing because those [31:45.600 --> 31:52.760] two terms are massively ambiguous, but if you ever see DRI2 it probably means that you're [31:52.760 --> 32:05.600] somewhere good. Then yeah, EGL is an abstraction of GLX. Rather than just plugging GL into [32:05.600 --> 32:14.960] X11 it lets you do Wayland, Android, whatever. All it really does is give you Windows that [32:14.960 --> 32:24.480] you can share with the Windows system, gives you some vague notion of timing, but it doesn't [32:24.480 --> 32:31.040] have any kind of events, so the only way you can get a consistent frame timing is if you [32:31.040 --> 32:40.120] block a lot in EGL. It just tries to hide everything and make it implicit, which again [32:40.120 --> 32:50.560] is where GBM comes in because that's what lets us steal buffers away from EGL, push [32:50.560 --> 33:03.400] them into KMS for display, handle our own timing and do it properly this time. EGL has [33:03.400 --> 33:11.840] that shape and then not coincidentally Vulkan has a fairly similar shape. Vulkan is the [33:11.840 --> 33:21.160] rendering API and that's it. Vulkan WSI is the EGL equivalent which provides that Windows [33:21.160 --> 33:29.480] system integration of creating Windows, posting content to them and so on. The main difference [33:29.480 --> 33:36.480] with Vulkan is that it's really, really explicit and clear about what it's doing. The downside [33:36.480 --> 33:42.240] is that because it's so explicit and clear you end up typing a hell of a lot of code. [33:42.240 --> 33:48.320] So it's more effort to use, but there's no magic hidden under Vulkan. You know exactly [33:48.320 --> 33:56.320] what's going on for better or worse. It's really good on the desktop that on mobile [33:56.320 --> 34:04.200] SOCs the hardware isn't necessarily entirely there yet. If you're doing high performance [34:04.200 --> 34:12.480] things or you just like seeing what's going on under the hood, I'd recommend Vulkan. [34:12.480 --> 34:18.280] And yeah, I think about the last bit that we'd end up having time for is I keep on going [34:18.280 --> 34:29.640] on about how we, you know, just saying that EGL will get things from GL to Wayland. The [34:29.640 --> 34:38.520] way we do that is DMA buff. It's a kernel concept about sharing memory regions between [34:38.520 --> 34:45.160] different subsystems, different processes, different contexts, whatever. So, you know, [34:45.160 --> 34:51.320] we've already got in the graphics side of things. We've got the gem buffer objects, [34:51.320 --> 34:59.160] but they're local to one particular device and to one particular user context. So, you [34:59.160 --> 35:05.880] know, when you want to export a buffer to your Wayland server or share it between, you [35:05.880 --> 35:16.480] know, V4L for your video capture and, excuse me, sorry, V4L for your video capture and [35:16.480 --> 35:24.320] your GPU to do some analysis on it. That's DMA buff, which just gives you a file descriptor [35:24.320 --> 35:32.400] you can use as a handle to that memory area and import it into different contexts or subsystems [35:32.400 --> 35:39.640] or places. And that's completely consistent throughout the stack, like all of Wayland, [35:39.640 --> 35:47.520] EGL, KMS, Vulkan, everything I've discussed has DMA buff integration because that's our [35:47.520 --> 35:55.040] lowest common denominator. So, yeah, we put it all together. I mean, because they're all [35:55.040 --> 36:02.040] built on the same building blocks, it's largely how you think it is. Well, hopefully if I've [36:02.040 --> 36:11.520] done a decent job of this talk, you know, the client's connecting to the compositor. [36:11.520 --> 36:18.440] It's creating a window declaring some very simple annotations about that. It wants to [36:18.440 --> 36:24.680] use the GPU, so it creates an EGL context pointing to the Wayland server. I'd like to [36:24.680 --> 36:32.960] render over here. The Wayland server has some DMA buff protocols, which tells it what it [36:32.960 --> 36:43.480] can and can't accept. The client uses GLES to render into that. That's wrapped in a [36:43.480 --> 36:52.520] DMA buff and passed over to the compositor. The compositor is deciding how to place and [36:52.520 --> 37:00.320] configure everything. It's importing that DMA buff that it's got from the client to [37:00.320 --> 37:08.800] generate one final image. It's then waiting until the next deadline, you know, that sort [37:08.800 --> 37:17.320] of 60 hertz cadence that we have. It's waiting until the next deadline to present that out [37:17.320 --> 37:24.960] going into KMS. That might be KMS doing its own composition directly in the display hardware [37:24.960 --> 37:42.800] or through the GPU itself. It's tough because the display hardware can do that final image [37:42.800 --> 37:47.680] composition of taking your sort of four or five images, mashing them all together and [37:47.680 --> 37:53.920] coming up with one. It is, like I said, a really measurable win on things like power and memory [37:53.920 --> 38:02.360] bandwidth, memory usage as well, but it's kind of complicated in that, you know, it's hard [38:02.360 --> 38:12.280] to know, be predictable about when you can and can't use it. It's a bit fiddly. It's [38:12.280 --> 38:18.760] one of the reasons I recommend using compositor frameworks like LibWestern, which do do all [38:18.760 --> 38:25.440] of this heavy lifting for you. You know, I've spent 10 years of my life trying to solve [38:25.440 --> 38:38.680] this problem and wouldn't recommend anyone else does it. It's not even really that interesting. [38:38.680 --> 38:47.840] Internally, Western has, like I said, that kind of brute force loop of just trying every [38:47.840 --> 38:56.240] possible configuration that could work, seeing what happens and throwing it at KMS to check [38:56.240 --> 39:02.960] if that will work. Currently, that's the most advanced one, but yeah, others are catching [39:02.960 --> 39:13.240] up. I think really to sum up what I was trying to say about GPUs and efficiency is one of [39:13.240 --> 39:20.200] the things that gets collaborate a lot is that no one realizes that every problem on [39:20.200 --> 39:28.960] mobile comes down to memory bandwidth. And so you can solve every problem by just copying [39:28.960 --> 39:36.600] buffers around more. But when you've got 4K buffers and you've got a low-end device, [39:36.600 --> 39:42.440] it turns out that this is always where your performance problem is. It's down in things [39:42.440 --> 39:51.000] like copies and naive memory usage. So yeah, that's just one thing to really be aware of [39:51.000 --> 40:00.280] is try and go for a zero-copy pipeline because when you have 4K and 144Hz, you really don't [40:00.280 --> 40:09.760] have much time and you don't want to spend it all just waiting for slow memory. Yeah, [40:09.760 --> 40:17.240] with that, I think we're pretty much coming up on time. So yeah, there's the quick whirlwind [40:17.240 --> 40:25.880] tour of how all that fits together. Anyone has any questions or wants to talk about how [40:25.880 --> 40:34.880] Wayland's amazing? Please feel free. If you have any questions, please raise your hand. [40:34.880 --> 40:40.600] When we launch a game in full screen, for example, does it go straight from GPU to screen [40:40.600 --> 40:46.160] or does it go all the way through KMS on that? It will go through the Windows system. So [40:46.160 --> 40:49.640] yeah, the question being, if you have a full screen game, will it go straight from the [40:49.640 --> 40:55.560] GPU to the display or will the Windows system still be involved? It will still be there, [40:55.560 --> 41:01.840] but ideally doing nothing. So it will just take the client buffer, give it directly to [41:01.840 --> 41:08.400] KMS and ask KMS to display it in the happy case. But it's always involved as the mediator, [41:08.400 --> 41:17.920] so when a notification pops up, it already has control, so it can show it. [41:17.920 --> 41:37.040] Hello. I can't. Is it working? Yeah. Okay. So forget the super new big question. When [41:37.040 --> 41:42.480] you say the frame buffer is tied to a plane, a plane is not a desktop, a plane is just [41:42.480 --> 41:50.720] a window. When you tie a frame buffer to a plane, the plane goes in the compositor. [41:50.720 --> 41:56.680] So the plane is a window, it's not the entire desktop. Yeah, exactly. So the CRTC is your [41:56.680 --> 42:10.280] final output as one flat image and planes are windows within that CRTC. Thank you. [42:10.280 --> 42:30.200] More questions? All right. Hello. Is it working? Hello. You mentioned that kernel mode setting [42:30.200 --> 42:34.320] is used turning the pixels into... Sorry, could you please... Sorry. Yeah. You mentioned [42:34.320 --> 42:40.760] that KMS kernel mode setting is used to turn the data into pixels on the screen. Is this [42:40.760 --> 42:45.840] where graphics card drivers are involved, another vendor-specific software, or is that [42:45.840 --> 42:51.720] earlier or later in the pipeline? Sorry, which parameters? So basically, where did graphics [42:51.720 --> 42:55.600] card drivers come in? Because I know there's like vendor-specific hardware that requires [42:55.600 --> 43:00.560] its own drivers somewhere in kernel space, I believe, so what does this fit in the pipeline? [43:00.560 --> 43:07.080] So all of the properties and parameters are defined in kernel space, and we try to standardize [43:07.080 --> 43:13.440] them as much as possible. So in the generic world, we do stick pretty religiously to a [43:13.440 --> 43:18.560] standard set of parameters that have common behavior across everyone. If you go to things [43:18.560 --> 43:25.120] like Android where you have hardware composer and vendor-based tells, it's completely different. [43:25.120 --> 43:30.040] And they're all... That's more of a negotiation between kernel and user space, which are [43:30.040 --> 43:41.840] both vendor-specific. That answers your question. [43:41.840 --> 43:48.320] Do you know if there's any toolkit libraries for writing compositors that are not desktop-specific? [43:48.320 --> 43:55.760] Any compositor libraries that are... Libraries for writing compositors that are not desktop-specific. [43:55.760 --> 44:02.120] So it's like LibWestern is good for writing desktops, types, things, but for highly embedded [44:02.120 --> 44:08.520] use cases, I've found any things that make it easy to write a compositor like that. [44:08.520 --> 44:17.280] Yeah, so LibWestern's the one for those kind of embedded or single-purpose use cases. MOTA, [44:17.280 --> 44:24.000] which is the basis of GNOME shell, can be used by anyone else, but it's really GPU reliant. [44:24.000 --> 44:31.000] And WROOTS is, I guess, kind of in the middle. It's not as friendly and desktop-y as GNOME, [44:31.000 --> 44:37.320] but it's not as sort of insanely efficient as Western, and that's the halfway house, [44:37.320 --> 44:42.760] I guess. [44:42.760 --> 44:45.800] Is there any tool you would recommend for profiling? [44:45.800 --> 44:47.120] Sorry, could you speak up? [44:47.120 --> 44:54.120] Is there any tool that you would recommend for profiling, the graphics tech? Is there [44:54.120 --> 44:58.840] a tool for profiling the graphics tech? [44:58.840 --> 45:10.000] Profiling, are there any tools for profiling the graphics tech? Kind of. So Mesa has integration [45:10.000 --> 45:15.880] with a tool called Profetto, which is the basis of Android GPU Inspector. There's some [45:15.880 --> 45:24.280] support in there for Western, specifically, to interpose its timeline on top of Profetto, [45:24.280 --> 45:32.160] but it's pretty patchy, to be honest. We've been working on that basically to try and [45:32.160 --> 45:38.160] make it easier so we can stop getting paid for debugging and profiling stuff, to be honest. [45:38.160 --> 45:45.160] But yeah, it's a slow process. Profetto is the best one there. [45:45.160 --> 46:01.160] I have a question. So why can't we do screen recording or screen sharing in a Bayland? [46:01.160 --> 46:10.360] You can. Screen sharing in Bayland is done through the XDG screencast portal and we did [46:10.360 --> 46:17.200] that because once, if you try to put it in Bayland itself as like a core protocol for [46:17.200 --> 46:23.760] clients to use, it was really going against the grain because everything was designed [46:23.760 --> 46:29.880] with this idea of the timing coming from the display and flowing back to the clients. [46:29.880 --> 46:36.240] And then once you put it in the other way that the client's receiving content, it really [46:36.240 --> 46:44.680] just is a terrible fit with pretty much every interface we had. So it's easier for us to [46:44.680 --> 46:54.040] and also working for like sandboxing and containers to go with the XDG portal solution. And yeah, [46:54.040 --> 47:01.040] it works every way basically. Okay. I think, yeah. [47:01.040 --> 47:24.040] Okay. Thank you, Daniel. Thanks very much. Thank you.