[00:00.000 --> 00:11.480] So, our next talk is from Mikhail is using Genote as an enabler for research on mother [00:11.480 --> 00:39.880] happening systems. So, continuing on the Genote environment. [00:39.880 --> 00:46.880] Where are the leaflets, where are the leaflets, Stefan? [01:09.880 --> 01:37.880] Maybe we can switch now, I can give you my laptop. [01:37.880 --> 01:48.880] Do you have it, do you have the slides uploaded on the PENTA bar? [01:48.880 --> 01:53.880] Yes, I uploaded some, some mistakes. [01:53.880 --> 02:11.880] Let me give my laptop, I think that's being easy. [02:11.880 --> 02:30.880] Let's do some sort of demo or something, something bad happened. [02:30.880 --> 02:37.880] Oh, oh, oh, oh, oh, oh, oh, oh, oh. So, you have a stick? [02:37.880 --> 02:39.880] Do you have a slide? [02:57.880 --> 03:01.880] Yes, yes, yes, yes, it's better because... [03:02.880 --> 03:08.880] Okay, do you want to use the keyboard? [03:19.880 --> 03:22.880] Back, forward and left, right? [03:23.880 --> 03:30.880] You don't see the slides, you want to see the slides, right? [03:30.880 --> 03:32.880] Let me duplicate the screen then. [03:52.880 --> 04:07.880] Let me do it, it's okay. [04:07.880 --> 04:09.880] Let's see if we can... [04:09.880 --> 04:10.880] A, B, yeah. [04:13.880 --> 04:15.880] Yeah, help? [04:15.880 --> 04:17.880] Help, please start. [04:17.880 --> 04:26.880] I'm Michael and first a quick introduction for those who don't know me yet. [04:26.880 --> 04:32.880] I studied computer science at TU Dortmund and since 2018. [04:32.880 --> 04:41.880] I'm a PhD student at Osnabrück University and a full-time research assistant in the MX Kernel project, [04:42.880 --> 04:47.880] which is a joint project between TU Dortmund and Osnabrück University. [04:47.880 --> 04:54.880] And the focus of my research is on heterogeneous many-course systems for the data center. [04:59.880 --> 05:07.880] First I will, now I will present you my experiences I made with using Genet for research [05:07.880 --> 05:12.880] and how, yeah, I will show how well this worked out. [05:12.880 --> 05:19.880] And now, at first I'm not working for Genet Labs, [05:19.880 --> 05:25.880] so even if it might sound a bit like an advertisement, [05:25.880 --> 05:33.880] it's not exactly that I was paid for that or something. [05:34.880 --> 05:37.880] So let's start with a talk. [05:37.880 --> 05:43.880] Yeah, the operating systems we know today very well, [05:43.880 --> 05:52.880] like Linux or Windows, are basically from almost more than 30 years ago. [05:52.880 --> 05:55.880] For Linux it's even called as the architecture. [05:55.880 --> 05:59.880] And the systems there looked quite different. [05:59.880 --> 06:05.880] There was just a single CPU and only the CPU did computation work, [06:05.880 --> 06:10.880] contact switches were cheap, memory was scarce, [06:10.880 --> 06:17.880] and yeah, and we had the old dogma that I was always slower than the CPU [06:17.880 --> 06:22.880] and that by magnitude, but today things look different. [06:22.880 --> 06:24.880] Now we have many CPUs. [06:24.880 --> 06:31.880] I think octa-cores are even now the default for laptops [06:31.880 --> 06:38.880] and quad-cores are the factor default for small mobile devices. [06:38.880 --> 06:42.880] And as most of you will know, not just CPUs compute, [06:42.880 --> 06:47.880] but now we have GPUs and in data centers also FPGAs [06:47.880 --> 06:54.880] and I accelerate us in processing a memory [06:54.880 --> 07:03.880] and with this new amount of cores and deep memory hierarchies, [07:03.880 --> 07:05.880] contact switches aren't cheap anymore. [07:05.880 --> 07:09.880] Now we have to pay synchronization costs [07:09.880 --> 07:13.880] when scheduling processes by load balancing [07:13.880 --> 07:18.880] and we pay for the distributed memory architecture [07:18.880 --> 07:23.880] which we actually have in our systems with distributed caches [07:23.880 --> 07:28.880] with higher latencies for contact switches. [07:28.880 --> 07:32.880] Now main memory on the other hand has become abandoned [07:32.880 --> 07:37.880] at least for the data center and now we have heterogeneous memories, [07:37.880 --> 07:40.880] at least non-uniform memory access [07:40.880 --> 07:44.880] and it's also a trend towards distributed memory [07:44.880 --> 07:49.880] where we do not even have sharp memory guaranteed anymore. [07:49.880 --> 07:58.880] And also the IO has now become almost as fast as the CPU. [07:58.880 --> 08:05.880] So one might question whether the operating system abstractions [08:05.880 --> 08:09.880] and interfaces we are accustomed to like POSIX [08:09.880 --> 08:14.880] are still viable for these modern systems [08:14.880 --> 08:19.880] and there's a lot of research which argues that they are not. [08:19.880 --> 08:23.880] For example, the blocking IO of POSIX doesn't fit well [08:23.880 --> 08:27.880] when IO is as fast as the CPU can be [08:27.880 --> 08:30.880] then it doesn't make sense to block threads [08:30.880 --> 08:34.880] because the cost of unblocking a thread or a process [08:34.880 --> 08:38.880] is higher than actually doing polling or something. [08:38.880 --> 08:45.880] So we need further research on operating systems to deal with that. [08:45.880 --> 08:51.880] There's also a lot of research work investigating [08:51.880 --> 08:55.880] how to deal with such things like an FPGA [08:55.880 --> 09:00.880] which were completely different than a CPU does. [09:00.880 --> 09:06.880] So we need more research but there are some hurdles on the way [09:06.880 --> 09:10.880] that put OS research at a risk. [09:10.880 --> 09:14.880] One major hurdle is non-free licensing [09:14.880 --> 09:18.880] which prevents us from fully understanding the system [09:18.880 --> 09:26.880] or especially drivers for hardware like accelerators or GPUs [09:26.880 --> 09:30.880] and it makes a modifying system very difficult [09:30.880 --> 09:34.880] and even if one might be able to modify it [09:34.880 --> 09:38.880] it might not be publishable which is bad for research [09:38.880 --> 09:43.880] where you want to have is that your results are reproducible [09:43.880 --> 09:47.880] by other researchers. [09:47.880 --> 09:50.880] Now furthermore we have hardware black boxes [09:50.880 --> 09:55.880] which make it even harder to implement drivers [09:55.880 --> 09:59.880] and make it also difficult to evaluate the hardware [09:59.880 --> 10:05.880] because you can't quite figure out what is going on in the hardware there. [10:05.880 --> 10:10.880] And then there are also NDAs, non-disclosure agreements [10:10.880 --> 10:14.880] which may suppress unfavorable results [10:14.880 --> 10:19.880] so that you might have nice results or the paper [10:19.880 --> 10:23.880] you aren't allowed to publish them [10:23.880 --> 10:26.880] because some company doesn't like their results [10:26.880 --> 10:30.880] because it may damage their business [10:30.880 --> 10:36.880] because your results state that the hardware is not as good as they think. [10:36.880 --> 10:41.880] And one other big problem is missing documentation [10:41.880 --> 10:44.880] especially if this leads to reverse engineering [10:44.880 --> 10:50.880] like it was necessary for a long time for the NVIDIA GPUs [10:50.880 --> 10:54.880] where this no-volt open source driver has [10:54.880 --> 10:58.880] to be completely written from scratch [10:58.880 --> 11:02.880] via reverse engineering because NVIDIA didn't publish [11:02.880 --> 11:06.880] any useful documentation. [11:06.880 --> 11:10.880] Now a major problem then we face in research [11:10.880 --> 11:14.880] is the lack of manpower that puts start limits [11:14.880 --> 11:17.880] on what we can do [11:17.880 --> 11:23.880] and it can also endanger the success of the project itself. [11:23.880 --> 11:26.880] In research success of a project is measured [11:26.880 --> 11:29.880] and the amount of publications we can do [11:29.880 --> 11:34.880] which accounts for the amount of experiments we can do [11:34.880 --> 11:39.880] and this means we don't have so much time to implement drivers [11:39.880 --> 11:42.880] and such stuff. [11:42.880 --> 11:44.880] And also the complexity of modern hardware [11:44.880 --> 11:46.880] as we have seen in the previous talk [11:46.880 --> 11:49.880] for that MNT reform laptop [11:49.880 --> 11:53.880] can be quite intimidating and making it even harder [11:53.880 --> 11:56.880] to get an operating system working. [11:56.880 --> 12:02.880] So what does OS researchers do in the scenario? [12:02.880 --> 12:07.880] They mostly write workarounds and tweaks for Linux. [12:07.880 --> 12:12.880] Here is a short list of publications [12:12.880 --> 12:17.880] and they are mostly from OSDI 2020 [12:17.880 --> 12:21.880] and this is just the tip of the iceberg. [12:21.880 --> 12:22.880] What's going on? [12:22.880 --> 12:28.880] In fact most papers on the OSDI 2020 and 2021 [12:28.880 --> 12:33.880] OSDI are one of the major scientific conferences [12:33.880 --> 12:36.880] for operating system research [12:36.880 --> 12:40.880] and one can see that most papers here in grey [12:40.880 --> 12:44.880] that were OS research papers were actually just tweaks [12:44.880 --> 12:47.880] to Linux kernel and only the red part [12:47.880 --> 12:50.880] were really new operating systems [12:50.880 --> 12:55.880] with new concepts or abstractions. [12:55.880 --> 12:59.880] So now we know why they use Linux [12:59.880 --> 13:02.880] but I think Linux isn't a good idea [13:02.880 --> 13:06.880] because you still have a huge and complex code base [13:06.880 --> 13:11.880] to deal with as the previous talk might have already teased [13:11.880 --> 13:15.880] that it is still a lot of work working in the Linux kernel [13:15.880 --> 13:19.880] and getting a company with that [13:19.880 --> 13:23.880] and furthermore the positive compliance of Linux [13:23.880 --> 13:26.880] and also the strict requirement [13:26.880 --> 13:29.880] that you may never ever break user space [13:29.880 --> 13:35.880] puts hard limits on what we can do in research. [13:35.880 --> 13:39.880] So completely changing abstractions [13:39.880 --> 13:42.880] and interfaces that break user space [13:42.880 --> 13:47.880] will never ever have the chance to get into the kernel. [13:47.880 --> 13:49.880] At least it will be very difficult [13:49.880 --> 13:53.880] because we need to persuade Linux torwards [13:53.880 --> 13:57.880] to integrate them. [13:57.880 --> 14:00.880] And furthermore Linux is a moving target. [14:01.880 --> 14:05.880] The kernel APIs are changing rapidly [14:05.880 --> 14:10.880] and this needs a lot of maintenance work. [14:10.880 --> 14:14.880] So as we have seen before [14:14.880 --> 14:17.880] a small research team might not be able to do this maintenance [14:17.880 --> 14:20.880] and so extensions will break sooner or later. [14:20.880 --> 14:24.880] That's something we have experienced in our own research [14:24.880 --> 14:28.880] where we try to compare against other Linux extensions [14:28.880 --> 14:31.880] and they didn't compare with newer kernels [14:31.880 --> 14:34.880] or only worked with some ancient kernels [14:34.880 --> 14:38.880] which we couldn't run on our hardware. [14:40.880 --> 14:44.880] So one might ask isn't there something better to do [14:44.880 --> 14:49.880] OS research with but also is able to [14:49.880 --> 14:54.880] lower the burden of writing an OS from scratch. [14:55.880 --> 14:58.880] So something like a framework [15:00.880 --> 15:02.880] and such an OS framework [15:02.880 --> 15:06.880] that should be minimal ideally [15:06.880 --> 15:09.880] that eases understanding and makes it easier [15:09.880 --> 15:13.880] to change kernel primitives and add new interfaces [15:13.880 --> 15:16.880] and it can also assist debugging [15:16.880 --> 15:21.880] because you don't need to analyze a huge code base. [15:21.880 --> 15:25.880] This also makes it investigable. [15:27.880 --> 15:32.880] That's necessary to understand what's going on in the system. [15:32.880 --> 15:35.880] Ideally it has an open source code base [15:35.880 --> 15:38.880] and it provides some profiling to it. [15:39.880 --> 15:41.880] It should also be maintainable [15:41.880 --> 15:44.880] so that it has regular updates. [15:44.880 --> 15:49.880] So if its skill can work on newer hardware [15:50.880 --> 15:53.880] and not that it's five years later [15:53.880 --> 15:56.880] you can't use that framework anymore [15:56.880 --> 16:00.880] because it only supports very ancient hardware. [16:02.880 --> 16:07.880] Extensible is also quite obvious. [16:08.880 --> 16:11.880] It makes it easier to implement [16:11.880 --> 16:14.880] your own operating system services and abstractions [16:14.880 --> 16:18.880] and therefore it should have separation of contents [16:18.880 --> 16:21.880] and well-defined components [16:21.880 --> 16:25.880] and it should also be well documented. [16:27.880 --> 16:32.880] Ideally it should be a book and documented code [16:32.880 --> 16:36.880] and also portable to make it future-proof [16:36.880 --> 16:40.880] and enable it to be portable to other [16:40.880 --> 16:43.880] also experimental hardware [16:43.880 --> 16:47.880] like the NCM computer from ETH-Zurick. [16:49.880 --> 16:54.880] This would also then enable support for hardware OS code design. [16:55.880 --> 16:57.880] Basically what it meant here is [16:57.880 --> 17:02.880] it should not assume a specific CPU architecture [17:03.880 --> 17:08.880] and a nice thing to have would be composability at one time [17:08.880 --> 17:12.880] something like the module system Linux has [17:13.880 --> 17:16.880] which would allow to use, for example, [17:16.880 --> 17:19.880] different OS interfaces simultaneously [17:19.880 --> 17:21.880] to evaluate them against each other [17:21.880 --> 17:25.880] and find out what interface, for example, [17:25.880 --> 17:28.880] provides the best performance for a specific task. [17:32.880 --> 17:35.880] So now that we have that we might ask [17:35.880 --> 17:40.880] what is such a framework? [17:40.880 --> 17:43.880] Does something like that exist? [17:43.880 --> 17:48.880] And as the title already teespoiled [17:48.880 --> 17:54.880] there is actually, I propose the genot OS framework [17:54.880 --> 17:56.880] as a good candidate here. [17:57.880 --> 17:59.880] As we've seen in the previous talk, [17:59.880 --> 18:01.880] genot is an OS framework that provides [18:01.880 --> 18:03.880] different kernels, drivers. [18:04.880 --> 18:07.880] Now also from Linux which makes things [18:07.880 --> 18:10.880] easier getting hardware up and running [18:10.880 --> 18:13.880] and furthermore it also includes libraries [18:13.880 --> 18:17.880] which makes it easier to port existing benchmarks [18:17.880 --> 18:20.880] and later applications to genot [18:20.880 --> 18:24.880] and your special fork of genot you use for research. [18:26.880 --> 18:29.880] So getting back to the requirements, [18:29.880 --> 18:32.880] how well does genot fit the bill here? [18:33.880 --> 18:37.880] First it is minimal compared to Linux. [18:37.880 --> 18:42.880] We only have about 53,000 lines of code [18:42.880 --> 18:45.880] for genot with the Nova kernel [18:45.880 --> 18:48.880] for really the operating system kernel [18:48.880 --> 18:52.880] and the basic operating system abstractions [18:52.880 --> 18:58.880] and services while we take the same parts of the system, [18:58.880 --> 19:01.880] both are x86 only. [19:01.880 --> 19:06.880] We have 911,000 for Linux 4.140 [19:08.880 --> 19:11.880] which this number might be higher even by now. [19:15.880 --> 19:19.880] Also it's investigable because it's under GPL [19:19.880 --> 19:24.880] but yeah, the tracing and profiling [19:24.880 --> 19:29.880] is at the moment as I experience it quite basic [19:29.880 --> 19:34.880] it's not yet not comparable to Linux Perth [19:36.880 --> 19:39.880] but that might change in the future. [19:40.880 --> 19:42.880] It's also maintainable. [19:42.880 --> 19:46.880] I've seen that there were almost quarterly updates [19:46.880 --> 19:50.880] and I didn't have to wait three updates [19:50.880 --> 19:54.880] and didn't have to change much on the kernel API here. [19:54.880 --> 19:58.880] So I think that is also green here. [20:02.880 --> 20:06.880] Of course genot is a component-based system [20:06.880 --> 20:10.880] so everything is clearly separated into single components [20:10.880 --> 20:16.880] which have an RPC interface which is basically well defined. [20:16.880 --> 20:19.880] That means that the basic foundation [20:19.880 --> 20:21.880] should work with this RPC interface [20:21.880 --> 20:25.880] are the same for each of those components [20:27.880 --> 20:31.880] and the requirements for adding new components [20:31.880 --> 20:34.880] are quite minimal because [20:36.880 --> 20:39.880] if you don't need, for example, an NVMe driver [20:39.880 --> 20:42.880] then you don't have to deal with such things [20:42.880 --> 20:44.880] in that interfaces. [20:44.880 --> 20:48.880] Basically you just need to know the core services [20:48.880 --> 20:50.880] and libraries of Genot [20:50.880 --> 20:53.880] which are very well documented in the book [20:53.880 --> 20:57.880] Genot Foundations which help me a lot to understand [20:57.880 --> 21:01.880] how Genot works and what the concepts are here. [21:03.880 --> 21:07.880] Also they have an extensive change log for each release [21:07.880 --> 21:11.880] and there's also the genodian's block [21:12.880 --> 21:15.880] and the FOSDEMT talks. [21:15.880 --> 21:19.880] It's also portable as we have already seen [21:19.880 --> 21:21.880] in the previous talk [21:21.880 --> 21:24.880] and it has its component-based architecture. [21:24.880 --> 21:28.880] We saw the LightCentraler in the previous talk [21:28.880 --> 21:33.880] which allowed to add components at runtime [21:33.880 --> 21:37.880] or exchange them and change the configurations [21:37.880 --> 21:41.880] and it's also possible to have multiple instances [21:41.880 --> 21:46.880] of a service at runtime. [21:49.880 --> 21:54.880] That makes Genot a quite good fit as I think. [21:55.880 --> 22:00.880] How much does it facilitate as research? [22:01.880 --> 22:05.880] I might ask here and before I start with that [22:05.880 --> 22:10.880] I want to present shortly my own research operating system [22:10.880 --> 22:12.880] called ELAN OS [22:12.880 --> 22:15.880] which is an experimental implementation [22:15.880 --> 22:18.880] of the MX Kernel architecture we devised [22:18.880 --> 22:20.880] in our research project [22:20.880 --> 22:24.880] and is based on the Genot OS framework. [22:25.880 --> 22:29.880] In this MX Kernel architecture [22:29.880 --> 22:32.880] we have three basic concepts [22:34.880 --> 22:39.880] for clarification, the squares in this picture [22:39.880 --> 22:41.880] represent a hardware resource [22:41.880 --> 22:44.880] like a CPU core or a part of memory [22:46.880 --> 22:50.880] and then we have the first concept [22:50.880 --> 22:52.880] which are organisms [22:52.880 --> 22:56.880] and these are basically resource containers [22:56.880 --> 23:01.880] for applications that follow a specific common goal [23:01.880 --> 23:06.880] for example a web application like a web store [23:06.880 --> 23:10.880] which usually is comprised of a database, a web server [23:10.880 --> 23:14.880] and some implementation logic for the store itself [23:14.880 --> 23:18.880] so we would have usually three programs running [23:19.880 --> 23:22.880] and they have the same goal [23:22.880 --> 23:26.880] to provide this web store experience [23:26.880 --> 23:31.880] and they are then in LOS put into one organism [23:31.880 --> 23:34.880] and the resource management is controlled [23:34.880 --> 23:37.880] by a component we call IVAT [23:37.880 --> 23:39.880] that's finished for brain [23:39.880 --> 23:44.880] for an organism that can be application or user specific [23:46.880 --> 23:51.880] and can also provide a specific operating system interface [23:51.880 --> 23:56.880] for example it might allow to provide a POSIX interface [23:56.880 --> 24:00.880] or another custom OS interface [24:00.880 --> 24:05.880] whatever the applications are needing [24:05.880 --> 24:09.880] and these organisms can also grow [24:09.880 --> 24:13.880] and shrink in the amount of resources they use [24:14.880 --> 24:18.880] for example if this yellow one [24:18.880 --> 24:21.880] wouldn't need these resources here [24:21.880 --> 24:25.880] then the red one could also extend there [24:26.880 --> 24:29.880] for this we have Heuter [24:29.880 --> 24:31.880] that's the global resource management [24:31.880 --> 24:37.880] that has the task to provide a fair amount [24:37.880 --> 24:40.880] of research utilization [24:40.880 --> 24:43.880] and between organisms [24:43.880 --> 24:46.880] and can also implement such things [24:46.880 --> 24:50.880] like server level agreements [24:51.880 --> 24:55.880] and within an organism we have cells [24:55.880 --> 24:59.880] these are basically your processes [24:59.880 --> 25:04.880] and they have also an elastic resource container [25:04.880 --> 25:10.880] in our system we have a strict rule [25:10.880 --> 25:15.880] that space partitioning comes for [25:15.880 --> 25:19.880] before time sharing [25:19.880 --> 25:22.880] and that makes it necessary [25:22.880 --> 25:26.880] if we have diverging loads [25:26.880 --> 25:30.880] that these containers might have to shrink [25:30.880 --> 25:34.880] to save resources and especially to grow [25:34.880 --> 25:42.880] if already assigned resources don't suffice [25:46.880 --> 25:49.880] and then one new abstraction [25:49.880 --> 25:51.880] we also added is that [25:51.880 --> 25:54.880] we changed the default control flow abstraction [25:54.880 --> 25:57.880] from threads to tasks [25:57.880 --> 26:00.880] which are closed units of work [26:00.880 --> 26:03.880] you can think of them as [26:03.880 --> 26:05.880] a remote procedure call [26:05.880 --> 26:09.880] or a bit bigger method call [26:09.880 --> 26:13.880] although its execution time is quite short [26:13.880 --> 26:16.880] in microseconds to milliseconds range [26:16.880 --> 26:19.880] compared to the lifetime of a thread [26:19.880 --> 26:22.880] therefore we can allow it to be [26:22.880 --> 26:25.880] not preemptible between each other [26:25.880 --> 26:29.880] which then allows us to annotate them [26:29.880 --> 26:31.880] to synchronization [26:31.880 --> 26:33.880] or provide automatic prefetching [26:33.880 --> 26:35.880] and other nice things [26:37.880 --> 26:40.880] as the architecture then looks like [26:40.880 --> 26:44.880] that we have our application running in user space [26:44.880 --> 26:47.880] and in kernel space we have 2KIA [26:47.880 --> 26:51.880] which is basically a fork of the NOVA microhypervisor [26:51.880 --> 26:54.880] especially the genote version of it [26:54.880 --> 27:00.880] which then will fulfill the role as a resource provider [27:00.880 --> 27:04.880] it will basically on the command of Heutea [27:04.880 --> 27:08.880] it will either withdraw a resource from an application [27:08.880 --> 27:10.880] or grant a resource [27:16.880 --> 27:22.880] so how did we implement this with genet? [27:24.880 --> 27:29.880] first for organisms we use the feature of service interception [27:29.880 --> 27:31.880] genote allows to have [27:31.880 --> 27:35.880] that you have several instances of its core service [27:35.880 --> 27:38.880] so for example you can implement your [27:38.880 --> 27:41.880] scheduler, memory allocator and such things [27:41.880 --> 27:47.880] and then we route the service that it uses [27:47.880 --> 27:51.880] specialized or as service rather than the generic one [27:51.880 --> 27:55.880] cells are implemented as genote components [27:55.880 --> 27:59.880] and one feature genote already has is resource trading [27:59.880 --> 28:02.880] but only for RAM and we will extend that [28:02.880 --> 28:05.880] so that it also can work with CPUs [28:05.880 --> 28:09.880] to implement that growing and shrinking of cells [28:09.880 --> 28:14.880] and for tasks the genote didn't have anything [28:14.880 --> 28:19.880] when we started but we have already developed [28:19.880 --> 28:23.880] a task based runtime library and framework [28:23.880 --> 28:26.880] basically it was a colleague from Dortmund [28:26.880 --> 28:29.880] which is called amix tasking [28:29.880 --> 28:33.880] what I did was porting this to genet [28:33.880 --> 28:39.880] and for this I needed a standard C++ library [28:39.880 --> 28:45.880] because it uses this for the internal data structures [28:45.880 --> 28:49.880] and to be portable a file system [28:49.880 --> 28:54.880] for the benchmarks and for writing out the profiling results [28:54.880 --> 28:56.880] from that benchmarks [28:56.880 --> 29:01.880] timer support which was also needed for the profiling [29:01.880 --> 29:05.880] and of course we needed multi-core support [29:05.880 --> 29:10.880] which was necessary to provide task parallelism [29:10.880 --> 29:14.880] and the last thing was NUMA support [29:14.880 --> 29:19.880] NUMA stands for non-uniform memory architecture [29:19.880 --> 29:22.880] and this is needed by amix tasking [29:22.880 --> 29:25.880] because it does NUMA aware task scheduling [29:25.880 --> 29:30.880] and data object allocations and placements [29:30.880 --> 29:33.880] and here comes the tricky part [29:33.880 --> 29:36.880] if you would have to do this from scratch [29:36.880 --> 29:38.880] in your own operating system [29:38.880 --> 29:43.880] you would have to implement quite a huge amount of code here [29:43.880 --> 29:47.880] but genote comes here to rescue [29:47.880 --> 29:50.880] because it already provides a standard C++ library [29:50.880 --> 29:55.880] file system timer support and also multi-core support [29:58.880 --> 30:02.880] what we needed to add was NUMA support [30:06.880 --> 30:11.880] and for this we extended the NOVA microhypervisor [30:11.880 --> 30:16.880] that now it parses the AKPS wrap tables [30:17.880 --> 30:22.880] to find out which CPU cores belong to which NUMA region [30:22.880 --> 30:27.880] and also the address, memory address ranges [30:27.880 --> 30:30.880] of the NUMA regions which are later used [30:30.880 --> 30:33.880] for a NUMA aware allocator [30:34.880 --> 30:37.880] furthermore we implemented [30:37.880 --> 30:43.880] and this thing cost only 365 lines of code [30:43.880 --> 30:48.880] where the barest of it was just the definition [30:48.880 --> 30:52.880] of these table structures in C++ code [30:52.880 --> 30:55.880] Michael are you close to finishing? [30:55.880 --> 30:57.880] Time is almost up [31:04.880 --> 31:06.880] I can't hurry up [31:06.880 --> 31:08.880] Two more minutes please [31:08.880 --> 31:15.880] Then we implemented a topology service [31:15.880 --> 31:21.880] with 531 lines of code and also NUMA aware [31:21.880 --> 31:24.880] Sorry, you have one hour, sorry for that [31:24.880 --> 31:26.880] My bad, sorry for that [31:26.880 --> 31:29.880] I don't have my laptop, I didn't [31:29.880 --> 31:31.880] Sorry, go ahead, my bad [31:33.880 --> 31:35.880] No problem [31:38.880 --> 31:40.880] I'm so good at whipping people [31:46.880 --> 31:48.880] Based on this NOVA extension [31:48.880 --> 31:52.880] then we developed a topology service [31:52.880 --> 31:58.880] which enables now to also query the NUMA topology [31:58.880 --> 32:00.880] not just for the core components [32:00.880 --> 32:03.880] but also the user space applications [32:03.880 --> 32:08.880] can now ask for example where does this thread [32:08.880 --> 32:11.880] I'm currently running in [32:11.880 --> 32:14.880] is in the NUMA topology [32:14.880 --> 32:17.880] which can be used for example then [32:17.880 --> 32:23.880] for actually allocating memory locally [32:23.880 --> 32:26.880] or from a specific NUMA region [32:26.880 --> 32:29.880] which is a usual use case [32:30.880 --> 32:33.880] used for implementing database applications [32:33.880 --> 32:37.880] and also for high performance computing [32:37.880 --> 32:42.880] and the last part was providing the glue code [32:42.880 --> 32:45.880] between the genote interfaces [32:45.880 --> 32:49.880] and this MX tasking runtime [32:49.880 --> 32:54.880] and as one can see this was about [32:54.880 --> 32:58.880] 1500 lines of code [32:58.880 --> 33:04.880] which is quite manageable for a single developer [33:06.880 --> 33:10.880] and we also started to implement [33:10.880 --> 33:14.880] something from scratch at the beginning of the project [33:14.880 --> 33:18.880] which only compressed mostly what [33:18.880 --> 33:22.880] the hardware abstraction layer here in grey [33:22.880 --> 33:26.880] which was to get the system [33:27.880 --> 33:29.880] running at all [33:29.880 --> 33:34.880] and the other was this task-based interface [33:34.880 --> 33:37.880] and this thing already needed about [33:37.880 --> 33:40.880] 24,000 lines of code [33:40.880 --> 33:44.880] while again this ELNOS [33:44.880 --> 33:46.880] this genote-based thing I started [33:46.880 --> 33:52.880] has only about 5,500 lines of code [33:52.880 --> 33:55.880] and I have to add that this [33:55.880 --> 33:59.880] from scratch version did not have [33:59.880 --> 34:01.880] anything like components [34:01.880 --> 34:03.880] that there was no support [34:03.880 --> 34:05.880] for memory protection for example [34:05.880 --> 34:08.880] and it could only run a single application [34:08.880 --> 34:11.880] while now with genote we can have [34:11.880 --> 34:14.880] several applications which are memory protected [34:14.880 --> 34:16.880] and isolated from each other [34:16.880 --> 34:19.880] and we also have well defined [34:19.880 --> 34:21.880] inter-processed communication mechanisms [34:21.880 --> 34:24.880] like the remote procedure call interface [34:24.880 --> 34:27.880] and also semaphors and such stuff [34:27.880 --> 34:30.880] which was still lacking in the [34:30.880 --> 34:33.880] from scratch version [34:33.880 --> 34:38.880] and in time this is just an estimation here [34:38.880 --> 34:41.880] for especially the effort here [34:41.880 --> 34:45.880] I assume here that a single developer [34:45.880 --> 34:48.880] can arrive about 10 lines of code [34:48.880 --> 34:51.880] which is some approximation [34:51.880 --> 34:54.880] that is usually used to calculate men month [34:54.880 --> 35:00.880] and this culminated in about 18 men month [35:00.880 --> 35:04.880] for the implementation we did from scratch [35:04.880 --> 35:07.880] I have to admit that I didn't do [35:07.880 --> 35:09.880] all the coding by myself [35:09.880 --> 35:13.880] I had help from the people of TU Dortmund [35:13.880 --> 35:18.880] and could use a small research operating [35:18.880 --> 35:20.880] and not research but an operating system [35:20.880 --> 35:23.880] for teaching which did already [35:23.880 --> 35:27.880] the very basic stuff to get a system [35:27.880 --> 35:29.880] up and running but didn't include [35:29.880 --> 35:33.880] all the NUMA stuff [35:33.880 --> 35:37.880] and not much drivers [35:37.880 --> 35:40.880] and for the Elon OS that was something [35:40.880 --> 35:42.880] I did completely by myself [35:42.880 --> 35:45.880] in about six months [35:45.880 --> 35:50.880] which is compared about a time saving [35:50.880 --> 35:53.880] of almost 90% here [35:53.880 --> 35:57.880] this NUMA should be taken with a grade of salt [35:57.880 --> 36:00.880] because it's lots of approximation here [36:00.880 --> 36:03.880] but I think you get the picture [36:03.880 --> 36:08.880] using genote I was able to really accelerate [36:08.880 --> 36:11.880] this implementation and engineering effort [36:11.880 --> 36:15.880] which usually does not yield any scientific publications [36:15.880 --> 36:17.880] because you implement something [36:17.880 --> 36:20.880] that everyone else has already done [36:20.880 --> 36:23.880] it's nothing new [36:23.880 --> 36:29.880] and this helped me a lot [36:29.880 --> 36:32.880] making progress [36:32.880 --> 36:37.880] and now I want to show you how I used [36:37.880 --> 36:42.880] genote's internal scenario concept [36:42.880 --> 36:45.880] basically it's some kind of [36:45.880 --> 36:48.880] and this component concept [36:48.880 --> 36:53.880] to do automatic experiments [36:53.880 --> 36:55.880] but first a quick recap [36:55.880 --> 37:00.880] genote consists of components [37:00.880 --> 37:03.880] that are these red boxes here [37:03.880 --> 37:09.880] and then they're aligned in a tree [37:09.880 --> 37:12.880] usually you have an inert component [37:12.880 --> 37:18.880] that then starts all the other components [37:18.880 --> 37:22.880] and then you can specify within a scenario [37:22.880 --> 37:25.880] or something like an XML config [37:25.880 --> 37:30.880] how these components are related to each other [37:30.880 --> 37:33.880] for example that's its inert component shell [37:33.880 --> 37:36.880] start a GUI component and a launcher component [37:36.880 --> 37:41.880] and the launcher component then starts an application component [37:41.880 --> 37:45.880] and this application component uses this GUI session [37:45.880 --> 37:48.880] and has also the rights to use it [37:48.880 --> 37:54.880] and such stuff [37:54.880 --> 37:58.880] but now I want to show you [37:58.880 --> 38:02.880] how these XML configurations work [38:02.880 --> 38:06.880] in a real experimental setting [38:06.880 --> 38:08.880] I've brought you an example [38:08.880 --> 38:11.880] from the database community [38:11.880 --> 38:14.880] that is a bilingtree benchmark [38:14.880 --> 38:19.880] a bilingtree is a widespread data structure [38:19.880 --> 38:22.880] that is used for indexing database tables [38:22.880 --> 38:25.880] and it's also used very often [38:25.880 --> 38:28.880] to implement key value stores [38:28.880 --> 38:34.880] such as memcached [38:34.880 --> 38:38.880] and now we would like to investigate [38:38.880 --> 38:42.880] how the throughput of this benchmark is affected [38:42.880 --> 38:47.880] when we run multiple insets on the same set of CPU cores [38:47.880 --> 38:49.880] and do time sharing [38:49.880 --> 38:54.880] so that we have to pay these contact switch costs [38:54.880 --> 38:59.880] and then we do the spatial partitioning [38:59.880 --> 39:02.880] I explained earlier [39:02.880 --> 39:05.880] and let us assume our research quest [39:05.880 --> 39:10.880] which scenario will yield the higher throughput [39:10.880 --> 39:16.880] at the respective maximum of cores [39:16.880 --> 39:23.880] so let's take a look at what we have to build up [39:23.880 --> 39:25.880] as component tree [39:25.880 --> 39:28.880] first we have oraces in it [39:28.880 --> 39:31.880] and we want to, for example, have three instances [39:31.880 --> 39:34.880] of this bilingtree benchmark [39:34.880 --> 39:38.880] there are a number named bilingtree123 [39:38.880 --> 39:44.880] and they all need the timer service for the genet [39:44.880 --> 39:50.880] and basically to just define this structure [39:50.880 --> 39:54.880] we would write the code on the right [39:54.880 --> 40:00.880] which is just this, oh, sorry [40:00.880 --> 40:05.880] this config tag and then for each component [40:05.880 --> 40:09.880] you write start the name of how the component shall be named [40:09.880 --> 40:11.880] and then close that start tag [40:11.880 --> 40:18.880] for the bilingtree we have one exception here [40:18.880 --> 40:22.880] since bilingtree is a name which is shared [40:22.880 --> 40:27.880] by all three components we specify a specific binary name [40:27.880 --> 40:32.880] here which is called bilingtree [40:32.880 --> 40:37.880] and called the components differently [40:37.880 --> 40:42.880] that's just because genet has to have, yeah [40:42.880 --> 40:47.880] requires that each component has a unique name [40:47.880 --> 40:50.880] which is needed for this service routing [40:50.880 --> 40:56.880] and checking access writes [40:56.880 --> 41:01.880] since now we have the basic structure here [41:01.880 --> 41:05.880] and we need to define that this timer component [41:05.880 --> 41:09.880] is actually an operating system service here [41:09.880 --> 41:12.880] this is done with a provat tag [41:12.880 --> 41:18.880] and adding here that the service shall be named timer [41:18.880 --> 41:22.880] and then we also have to specify where it can find [41:22.880 --> 41:26.880] the other operating system services it needs [41:26.880 --> 41:29.880] so it's just the default route stating that [41:29.880 --> 41:32.880] if it wants to make a connection to another service [41:32.880 --> 41:39.880] it should either ask its parent or one of its siblings [41:39.880 --> 41:44.880] and then we do this for the bilingtree one component [41:44.880 --> 41:50.880] for example and we have also ready to add something else [41:50.880 --> 41:57.880] because we also want to have this timer service [41:57.880 --> 42:01.880] and this is done by specifying that the name of the service [42:01.880 --> 42:06.880] we need is timer and that one of the siblings [42:06.880 --> 42:09.880] that's done with this child tag here [42:09.880 --> 42:12.880] shall be used for that [42:12.880 --> 42:14.880] which is here timer [42:14.880 --> 42:18.880] we could also have another component called timer [42:18.880 --> 42:25.880] and then write that name and that would use another timer [42:25.880 --> 42:29.880] and we could also do this for the other trees [42:30.880 --> 42:37.880] so this basically allows us to do the service interception [42:37.880 --> 42:43.880] because here we can then specify which actual implementation [42:43.880 --> 42:47.880] or component that provides the service shall be used [42:50.880 --> 42:55.880] after that we need to specify where this component shall run [42:55.880 --> 42:58.880] to realize the experiment [42:58.880 --> 43:01.880] but first I want to mention that [43:01.880 --> 43:08.880] Genet manages CPU cores not just as a set of IDs [43:08.880 --> 43:13.880] but in a two-dimensional space which is called an affinity space [43:13.880 --> 43:18.880] which looks like this and each point in this matrix [43:18.880 --> 43:23.880] is a CPU core and one can map components like these [43:23.880 --> 43:27.880] in a components to subsets of this [43:27.880 --> 43:31.880] and we will use this mechanism now to place [43:31.880 --> 43:39.880] our Billing Tree benchmark components to the cores as was stated in the experiment [43:39.880 --> 43:44.880] but first we have to specify the affinity space [43:44.880 --> 43:49.880] that's the huge gray square [43:50.880 --> 43:54.880] and we might assume here that we have a machine with 64 cores [43:54.880 --> 43:56.880] and to make things easier we say [43:56.880 --> 43:59.880] with the 64 inside with one [43:59.880 --> 44:02.880] so we do not have to calculate coordinates here [44:04.880 --> 44:07.880] and after we have done that [44:07.880 --> 44:11.880] we can pick out these subsets [44:11.880 --> 44:17.880] and say for example Billing Tree 1 component shall be mapped at position X [44:17.880 --> 44:21.880] which corresponds to core 1 [44:21.880 --> 44:25.880] and shall use 63 cores [44:25.880 --> 44:30.880] which is stated by this width here and height [44:33.880 --> 44:37.880] and furthermore we have to specify a RAM limit [44:37.880 --> 44:40.880] because it's a database benchmark [44:40.880 --> 44:44.880] we need quite a huge amount of memory [44:44.880 --> 44:48.880] 80 gigabyte in this example [44:48.880 --> 44:52.880] and this is then done [44:52.880 --> 44:56.880] for each instance of the Billing Tree [44:56.880 --> 45:01.880] unfortunately this laptop didn't want to work with a beamer [45:01.880 --> 45:08.880] so I couldn't show you the final config that comes out of it [45:08.880 --> 45:13.880] but I already ran the benchmark [45:13.880 --> 45:18.880] beforehand and that would be the results [45:18.880 --> 45:21.880] if we would run the experiment [45:24.880 --> 45:27.880] and this also answers the question [45:27.880 --> 45:32.880] if we only consider inserts into the Billing Tree [45:32.880 --> 45:37.880] then it is better to use this spatial partitioning [45:37.880 --> 45:42.880] since we reach about 16 million operations per second [45:42.880 --> 45:46.880] this is 60 million insert operations [45:46.880 --> 45:50.880] of key value pairs into the Billing Tree [45:50.880 --> 45:56.880] while on the other side if we only have a read only workload [45:56.880 --> 46:01.880] that means we just look up a key in the Billing Tree then [46:03.880 --> 46:08.880] as we can see the time sharing out performs [46:09.880 --> 46:12.880] the strict spatial partitioning [46:14.880 --> 46:20.880] I didn't come to analyze this deeper [46:22.880 --> 46:26.880] and I don't think that's the scope of this talk here now [46:27.880 --> 46:32.880] so to conclude hardware has changed tremendously [46:32.880 --> 46:36.880] in the last two decades [46:36.880 --> 46:39.880] and we need more OS research [46:39.880 --> 46:43.880] but there is a high entrance hurdle to overcome [46:43.880 --> 46:47.880] which can then be lowered by an OS framework [46:47.880 --> 46:54.880] and my claim here is that Genote can significantly help here [46:54.880 --> 46:58.880] as this specific example from my experience [46:58.880 --> 47:02.880] it saved me about 90% of development time [47:02.880 --> 47:05.880] compared to when I would have to implement [47:05.880 --> 47:08.880] all the things by myself [47:08.880 --> 47:11.880] and furthermore my research operating system [47:11.880 --> 47:14.880] also provides some contributions to Genote [47:14.880 --> 47:19.880] which I might file in as a pull request [47:19.880 --> 47:22.880] which is this NUMA support [47:22.880 --> 47:26.880] and also some support for many core systems [47:26.880 --> 47:31.880] that I already contributed that by filing a bug report [47:32.880 --> 47:36.880] by finding a bug that Nova crashed a boot loop [47:36.880 --> 47:39.880] if you wanted to use more than 30 cores [47:39.880 --> 47:42.880] this has now been fixed [47:42.880 --> 47:44.880] and I've tested it [47:44.880 --> 47:48.880] it definitely works with 128 CPU cores [47:48.880 --> 47:51.880] on a real hardware machine [47:54.880 --> 47:58.880] and also the NUMA support I implemented is also working [47:59.880 --> 48:03.880] and last but not least [48:03.880 --> 48:07.880] we also now have a task parallel programming library [48:07.880 --> 48:10.880] which can be used with Genote [48:11.880 --> 48:16.880] now my focus will be clearly on research and the data [48:16.880 --> 48:19.880] in the data center here [48:19.880 --> 48:23.880] and my personal road to the future with ELAN OS [48:23.880 --> 48:27.880] is that next I wanted to implement more profiling tools in Genote [48:27.880 --> 48:30.880] especially hardware performance counters [48:30.880 --> 48:35.880] to actually find out why the plot looks like the resource [48:36.880 --> 48:40.880] then I wanted to implement this elasticity of cells [48:40.880 --> 48:44.880] especially the resource trading for CPU cores [48:44.880 --> 48:48.880] and then the management strategies [48:48.880 --> 48:52.880] for these iWord and Hoitaya components [48:52.880 --> 48:54.880] these resource managers [48:54.880 --> 48:58.880] and do an evaluation with a realistic scenario [48:58.880 --> 49:03.880] we already have implemented a database based on MX tasking [49:03.880 --> 49:07.880] which just only waits to be ported to Genote [49:07.880 --> 49:13.880] and then I hopefully will have a first full feature prototype [49:13.880 --> 49:17.880] of ELAN OS which can be used by the community here [49:18.880 --> 49:25.880] Thank you for your attention [49:25.880 --> 49:29.880] and I hope you get in touch with us [49:29.880 --> 49:32.880] Thank you, Mihail [49:35.880 --> 49:38.880] Thanks, questions from the audience? [49:39.880 --> 49:43.880] One question I may have is you talked quite a lot about research [49:43.880 --> 49:46.880] and it makes sense given your current work [49:46.880 --> 49:48.880] what do you think about productization? [49:48.880 --> 49:51.880] I mean getting into what the talks before [49:51.880 --> 49:53.880] they had some actual use cases [49:53.880 --> 49:55.880] there were business ventures [49:55.880 --> 49:57.880] what do you think about productization? [49:57.880 --> 49:58.880] Would this approach make sense [49:58.880 --> 50:00.880] where people are still going to get back to Linux [50:00.880 --> 50:02.880] because that's what everyone uses [50:02.880 --> 50:06.880] and it's going to be the default? [50:07.880 --> 50:09.880] That's still... [50:09.880 --> 50:11.880] I'm asking an opinion, I don't have a crystal ball [50:11.880 --> 50:13.880] I'm just saying what you're thinking [50:14.880 --> 50:17.880] It's our assumption [50:17.880 --> 50:19.880] that's why we're doing research [50:19.880 --> 50:21.880] that it will be better [50:21.880 --> 50:25.880] or at least have benefits compared to Linux [50:25.880 --> 50:28.880] also for production use [50:28.880 --> 50:32.880] especially we think that we can provide a better performance [50:32.880 --> 50:37.880] and also ease the development of database systems [50:37.880 --> 50:41.880] and other highly parallel applications [50:41.880 --> 50:45.880] like from high-performance computing communities [50:45.880 --> 50:47.880] Okay, thanks [50:47.880 --> 50:49.880] Any other questions? [50:49.880 --> 50:50.880] Yeah [50:50.880 --> 50:52.880] First, thank you for the talk [50:52.880 --> 50:54.880] I'm amazed about it [50:54.880 --> 50:57.880] because 15 years ago when we started with Genote [50:57.880 --> 50:59.880] we had dreamed about such things [50:59.880 --> 51:01.880] like research picking it up [51:01.880 --> 51:03.880] because we came from academia unfortunately [51:03.880 --> 51:05.880] my question is [51:05.880 --> 51:08.880] have you encountered any pain points [51:08.880 --> 51:10.880] on this journey? [51:10.880 --> 51:12.880] Yeah, in the last six months [51:12.880 --> 51:16.880] you have had a very intensive time with Genote [51:16.880 --> 51:19.880] was there anything that frustrated you about it [51:19.880 --> 51:24.880] or did you think this was probably not the right choice [51:24.880 --> 51:28.880] or something that we could pick up for improvement? [51:34.880 --> 51:36.880] Let me quickly think [51:36.880 --> 51:39.880] one thing that comes to my mind was [51:39.880 --> 51:42.880] the documentation of the tracing services [51:42.880 --> 51:45.880] profiling thing [51:45.880 --> 51:49.880] I figured out how this trace service works [51:49.880 --> 51:53.880] but it doesn't seem to do [51:53.880 --> 51:55.880] not so many things yet [51:55.880 --> 51:58.880] it's not comparable to what you get with Perf [51:58.880 --> 52:01.880] under Linux where you can exactly see [52:01.880 --> 52:04.880] the clock cycles and cache methods [52:04.880 --> 52:07.880] down to function level [52:07.880 --> 52:10.880] that would be nice to have [52:24.880 --> 52:27.880] Thanks, it was a very nice talk [52:27.880 --> 52:30.880] What specifically do you see [52:30.880 --> 52:33.880] regarding for example the NZN architecture [52:33.880 --> 52:35.880] from ETH Zurich [52:35.880 --> 52:38.880] how much should for example [52:38.880 --> 52:41.880] the Genote framework [52:41.880 --> 52:44.880] be tweaked to efficiently use [52:44.880 --> 52:48.880] such novel hardware architectures [52:48.880 --> 52:51.880] if you can make a guess or prediction [52:51.880 --> 52:53.880] Thank you [52:54.880 --> 52:57.880] I had thought this [52:58.880 --> 53:04.880] would for this NZN computer in particular [53:04.880 --> 53:09.880] would we often cause support for ARM [53:09.880 --> 53:11.880] which I think is there [53:11.880 --> 53:15.880] but it would have to be adjusted to the SOC they use [53:15.880 --> 53:18.880] and I'm not completely [53:18.880 --> 53:22.880] not such deep into this research computer [53:22.880 --> 53:25.880] I only attended a talk of Timothy Roscoe [53:25.880 --> 53:27.880] where he presented this thing [53:27.880 --> 53:30.880] and how cool it is [53:30.880 --> 53:34.880] so I'm not quite aware of what [53:34.880 --> 53:36.880] had to be done [53:36.880 --> 53:39.880] I think there would have to be implemented [53:39.880 --> 53:43.880] basically the usual stuff, drivers for it [53:56.880 --> 53:58.880] Yeah [54:01.880 --> 54:04.880] Thank you for the nice talk [54:04.880 --> 54:07.880] I wanted to ask you about your [54:07.880 --> 54:11.880] in a slide with future developments [54:11.880 --> 54:13.880] you mentioned what you want to do [54:13.880 --> 54:15.880] do you have some time frames [54:15.880 --> 54:18.880] if you're committed to doing this [54:18.880 --> 54:20.880] and how much Geno would help you [54:20.880 --> 54:22.880] in shortening these time frames [54:22.880 --> 54:25.880] how much easier would it make [54:25.880 --> 54:28.880] could you make some sort of prediction about it [54:28.880 --> 54:30.880] Thank you [54:30.880 --> 54:35.880] Using Geno I would estimate that [54:35.880 --> 54:39.880] the plan is that this all should be running [54:39.880 --> 54:41.880] in fall this year [54:41.880 --> 54:46.880] at least to this point here with the evaluation [54:47.880 --> 54:51.880] I also hired a student assistant [54:51.880 --> 54:54.880] which will help me from April onwards [54:54.880 --> 54:57.880] to develop a nicer interface [54:57.880 --> 55:01.880] that we do not need to always code this XML stuff [55:03.880 --> 55:06.880] With the profiling I already began [55:06.880 --> 55:08.880] the basic stuff is already there [55:08.880 --> 55:11.880] it's just that the interface are a bit ugly [55:11.880 --> 55:13.880] there are no capabilities [55:13.880 --> 55:16.880] and it's not implemented as a service yet [55:16.880 --> 55:21.880] but I can basically use it for my benchmark [55:21.880 --> 55:24.880] As an LSD sit [55:24.880 --> 55:31.880] I think these two parts will come to real [55:31.880 --> 55:35.880] will be realized until I think summer [55:35.880 --> 55:37.880] I would say [55:37.880 --> 55:41.880] and if I would have to do this all from hand [55:41.880 --> 55:44.880] I would assume that I wouldn't have been finished [55:44.880 --> 55:48.880] in one year [55:48.880 --> 55:53.880] not with the manpower I have available [55:53.880 --> 55:56.880] Have you contributed back any of your work [55:56.880 --> 55:58.880] back to Geno? [55:58.880 --> 56:03.880] Not yet but I plan to contribute this NUMA support [56:03.880 --> 56:07.880] I think this could be a nice addition to Geno [56:07.880 --> 56:11.880] because it would enable Geno to also be [56:11.880 --> 56:14.880] usable for data send applications [56:14.880 --> 56:18.880] at big servers where NUMA support is crucial [56:20.880 --> 56:24.880] and maybe later the performance counters [56:24.880 --> 56:27.880] when they are finished and polished [56:29.880 --> 56:32.880] Thank you Michael, thank you so much [56:37.880 --> 56:40.880] you