[00:00.000 --> 00:15.440] Okay. There's next talk. Please be silent. [00:15.440 --> 00:21.400] Okay. This will get an interesting thing. Normally I'm used to move my arms a lot while [00:21.400 --> 00:28.040] I'm talking, so I try to get the microphone always close to my body now. I will give you [00:28.040 --> 00:33.000] some information about the ELISA project. ELISA stands for Enabling Linux and Safety [00:33.000 --> 00:40.960] Applications. And maybe a quick question up front. Who is aware of safety critical software? [00:40.960 --> 00:48.480] So shortly raise the hand. Hi. That's good. Maybe 25, 30%. I hope you will also learn [00:48.480 --> 00:58.360] something new then. So maybe before we start fully, I just give you a short view on which [00:58.360 --> 01:03.200] project context I'm working. So as you can see, my project is mainly focusing on embedded [01:03.200 --> 01:09.040] IoT Linux at Bosch. And what you try to do is utilizing a lot of open source projects, [01:09.040 --> 01:15.200] see how they fit into a landscape, and can be of value for very, very different device [01:15.200 --> 01:21.200] classes because normally you don't believe it, but in all of these kind of products you [01:21.200 --> 01:28.880] will find Linux in there or also embedded real-time OS and so on. So that's all about [01:28.880 --> 01:36.120] this part, shortly about myself. Who am I? I'm a technical business development manager. [01:36.120 --> 01:41.680] I'm focusing on embedded open source, mainly doing this for the Bosch. And in parallel, [01:41.680 --> 01:46.080] that's also why I'm speaking here. I'm the technical steering committee chair and working [01:46.080 --> 01:52.600] group lead for the Linux foundation. I bring a past history of 15 years plus. I guess I [01:52.600 --> 01:58.800] started with Ubuntu 6.10 more or less to set it up on old PCs, sharing it to exchange students [01:58.800 --> 02:05.280] so like a distributed hub of PCs. And here since 10 years, I'm more or less in the automotive [02:05.280 --> 02:12.000] space with Linux. We had our first product with 2.6 kernel out. And I guess now we can [02:12.000 --> 02:19.800] start on the real things. So if we talk about Linux in safety critical systems, we first [02:19.800 --> 02:26.000] need to get an understanding what the system really means. And a critical system, maybe [02:26.000 --> 02:31.560] first you say assessing whether the system is safe requires understanding sufficiently. [02:31.560 --> 02:35.920] And you can see here's nothing about Linux in there because the system always goes beyond [02:35.920 --> 02:43.240] the scope of a pure operating system, beyond maybe a single component. And in this one, [02:43.240 --> 02:50.080] you have a system context in which Linux plays a role. And you need to understand the system [02:50.080 --> 02:55.400] context and how this is used. Because if you don't get the understanding how Linux operates, [02:55.400 --> 03:00.320] you cannot see in which components you're interested. And which features you may need [03:00.320 --> 03:06.760] or which not. And then you can evaluate what kind of these features are really relevant [03:06.760 --> 03:15.720] for safety. And while you're doing so, you're most likely identify gaps that exist and you [03:15.720 --> 03:22.200] will definitely need more and more work to get this done. So if you look into the Linux [03:22.200 --> 03:29.000] ecosystem, which we have already, there's a good chance or a good reason also to take [03:29.000 --> 03:35.080] Linux because there is a large variety of devices. The ecosystem is strong. You have [03:35.080 --> 03:40.840] good tools around this. An incredible amount of hardware support. It runs on many, many [03:40.840 --> 03:48.080] devices. And also very important, you have a broad set of experts in there. If you see [03:48.080 --> 03:55.560] what's sometimes taking as the benefit of a certified safety critical OS, often it comes [03:55.560 --> 03:59.840] with hard real-time requirements and capabilities. We know that the preempt or t-patches are [03:59.840 --> 04:06.160] in good shape in the kernel. But, well, hard real-time maybe goes even further down the [04:06.160 --> 04:12.960] road. And then there is a development process. And if you see these two sites, if you come [04:12.960 --> 04:18.360] and want to address very complex products like in the automotive field or maybe you [04:18.360 --> 04:23.480] can even call your robot vacuum cleaner a more complex product. And then you come from [04:23.480 --> 04:27.960] two perspectives. On the one side, you could go with a traditional small component-driven [04:27.960 --> 04:33.280] artist and you have to handle all the complexity. So you need to have more hardware involved. [04:33.280 --> 04:37.920] You have more multi-core support. Suddenly, not everything works out there. Or you go [04:37.920 --> 04:41.720] the other way around and you come with a Linux where you all have these kind of things, but [04:41.720 --> 04:44.920] you need to improve and see what do we do about the development process? What do you [04:44.920 --> 04:49.640] do about the real-time capabilities and so on? So anyway, when you build a more complex [04:49.640 --> 04:57.000] product, you need to find a way to tackle these kind of challenges and also bring the [04:57.000 --> 05:02.680] difference closer to each other. While we were looking at Linux, I'll take the part [05:02.680 --> 05:06.600] in the beginning. It's a little bit like a disclaimer and a little bit more text. In [05:06.600 --> 05:11.320] this collaboration of Elisa, we said we cannot engineer your system to be safe. We're talking [05:11.320 --> 05:17.920] about functional safety, not about cybersecurity. But if we just take this example, there's [05:17.920 --> 05:21.440] always a strong risk also that you have security breaches in your system. So it's similar [05:21.440 --> 05:27.440] here also with safety. If you build a system, it's still your responsibility for it. And [05:27.440 --> 05:32.880] just because we provide certain guidelines or engineering principles and so on, it's [05:32.880 --> 05:39.000] still in your responsibility as someone producing a product to make things safe. And also that [05:39.000 --> 05:45.960] way to make sure you really have the described processes in use, use the methodologies and [05:45.960 --> 05:51.160] one of the core questions which typically come is like, oh, so you're from Elisa, you [05:51.160 --> 05:56.280] make a safe Linux, will you certify a kernel version? And that's not what will work because [05:56.280 --> 06:00.760] we all know you have to move forward. There's continuous improvement, there's sink and there [06:00.760 --> 06:04.960] vulnerabilities fixed, so you need to go on. And this gives an additional challenge with [06:04.960 --> 06:09.320] the continuous certification. So we will definitely not have a version and we will also not certify [06:09.320 --> 06:15.400] Linux in this project. We just give the tools and other elements in there. So here, last [06:15.400 --> 06:19.880] part of it, there's still responsibility, legal applications, liability and so on, which [06:19.880 --> 06:25.120] is also near all. Nevertheless, we find a good set of partners already, which are willing [06:25.120 --> 06:30.120] to support this mission. And they subscribe and say we would like to bring the whole [06:30.120 --> 06:37.760] thing forward. And seeing this, there is the mission statement which we have drawn. It's [06:37.760 --> 06:42.200] lengthy basically. You can read that there's set of elements, processes, tools. It should [06:42.200 --> 06:47.720] be amenable to safety certification. We look into software and documentation development [06:47.720 --> 06:52.000] and in the end that we aid the development, deployment, operation, or the adoption of [06:52.000 --> 07:01.560] a project into another project. Okay. So if you look at this mission, you see basically [07:01.560 --> 07:06.560] four key parts, which we will also talk later about. You have elements and software, which [07:06.560 --> 07:13.240] is concrete implementation of what we're doing. And you also have the processes. A development [07:13.240 --> 07:18.160] process always falls into safety critical, into security system, wherever you look at. [07:18.160 --> 07:23.280] And if you start to automate things, if you would like to analyze, there's always a strong [07:23.280 --> 07:30.280] involvement of tools in there. And the last thing is when you do all this kind of work, [07:30.280 --> 07:35.760] you need to document it. And actually, there's a lot of documentation work needed in any [07:35.760 --> 07:46.160] place. So how will we do all this kind of things? We take it in our ELISA working groups. [07:46.160 --> 07:50.840] We split this depending on different topics, on different contexts. They're growing depending [07:50.840 --> 07:59.160] on demands of certain sizes reached. We're extending this. And if we take a first look, [07:59.160 --> 08:04.360] we have a safety architecture work group. This is a group which actively looks inside [08:04.360 --> 08:11.240] the kernel and takes, for example, a watchdog subsystem because watchdog is one of the crucial [08:11.240 --> 08:16.520] elements which we have in use. It looks what are potential safety related functionality. [08:16.520 --> 08:20.200] Is there something in the kernel which is non-safety related? How would these kind of [08:20.200 --> 08:26.720] things interfere? And by this, the safety architecture work group does a lot of analysis, [08:26.720 --> 08:33.080] try to improve documentation in the kernel, provide new tools. So that's a strong set [08:33.080 --> 08:40.640] in there, basically driven by use cases and demands of products. And a little more broader [08:40.640 --> 08:47.280] approach is brought in by the Linux features. And actually, the full name is Linux features [08:47.280 --> 08:51.760] for safety critical systems. So it's not about generic features. It's about the safety [08:51.760 --> 08:58.080] criticality part in there. You can imagine this a little bit like if you're familiar [08:58.080 --> 09:03.960] with security measures like namespaces or other parts that we're looking for elements [09:03.960 --> 09:10.960] in here which could improve safety. So which means if you take this special kernel configuration, [09:10.960 --> 09:16.320] a feature, turn it off on whatever you do and say, okay, this will come up as a blueprint. [09:16.320 --> 09:21.280] This is something how you better work with memory, how you not work with memory. All [09:21.280 --> 09:27.000] these kind of things are tackled in the Linux features. And then it's a nice group because [09:27.000 --> 09:32.560] with the results which are in there, if you're already in a process of enhancing Linux and [09:32.560 --> 09:36.040] don't want to wait for all the results of the use cases work group and so on, you can [09:36.040 --> 09:41.640] have incremental steps here, just take some part of it and make your system more robust, [09:41.640 --> 09:45.880] more dependable. And you can also judge it against how does it compare to securities [09:45.880 --> 09:51.480] things which you're doing. And so here that's the big value of this group. It's more on [09:51.480 --> 09:59.200] a direct use base and serving a long term safety argumentation, but not that it's something [09:59.200 --> 10:06.000] which develops for years. So it's basically assess what's there. As also the improvement [10:06.000 --> 10:11.120] of code quality is very important. We have tools investigation code improvement work [10:11.120 --> 10:15.880] group. The code improvements could be, for example, done with doing fuzzy testing on [10:15.880 --> 10:23.520] the kernel using tools like code checker or syscaller. And then bring them also into a [10:23.520 --> 10:30.600] setup where we have server kind of a CI which runs on Linux next or whatever kernel configuration [10:30.600 --> 10:36.080] to identify issues to get the kernel more robust, more dependable, reliable and serve [10:36.080 --> 10:42.920] also in the argumentation about the quality of the kernel. And what was also on the right [10:42.920 --> 10:46.680] side and some of the challenges part was on the engineering process. And as you know, [10:46.680 --> 10:52.160] there are rigorous methods within the kernel development. So there are a lot of reviews. [10:52.160 --> 10:57.920] Patches are rejected and you see that there's strong demand from traditional project management [10:57.920 --> 11:04.520] when it comes to safety products and not every process complies with it directly. So we need [11:04.520 --> 11:10.440] to find an argumentation. How is there an equivalence to the open source development [11:10.440 --> 11:18.400] process compared to what, for example, ISO 26262 requests for automotive products? [11:18.400 --> 11:23.600] On top what is very interesting to understand here is also that if we look into open source, [11:23.600 --> 11:32.640] you basically cannot easily buy a maintainer or developer there. So you cannot buy features [11:32.640 --> 11:39.560] directly or so you get more an unbiased view or maybe a personal view, but a maintainer [11:39.560 --> 11:45.480] who is really committed for the component for this power subsystem of the kernel and [11:45.480 --> 11:49.640] so on. And with this strong commitment, for example, you already fulfill a little bit [11:49.640 --> 11:54.600] of independent view because in safety systems, whenever it comes later on, the developer [11:54.600 --> 11:59.600] needs to commit to what has been done. But of course, it's not written down. It's not [11:59.600 --> 12:04.760] written down. The maintainer fully commits to whatever it does. So this is some part, [12:04.760 --> 12:09.720] for example, where you can start argumenting on it. And as the different elements need [12:09.720 --> 12:14.000] to get somewhere and need to be visible, we figured this out because we were running [12:14.000 --> 12:18.400] quite in parallel with different streams on this, but never brought this forward. We came [12:18.400 --> 12:24.040] up with the systems workgroup and the system workgroup actually should take all these different [12:24.040 --> 12:29.800] elements, bring them together, works cross-functional and maybe even cross-project and combine the [12:29.800 --> 12:36.920] elements. In order to tailor the system properly, we have vertical use cases, a newly created [12:36.920 --> 12:41.640] one, so there's not much information in this presentation about the aerospace workgroup [12:41.640 --> 12:45.800] yet. The overall idea is it should address everything which flies and you know that in [12:45.800 --> 12:50.640] aerospace there are many safety standards, safety integrity standards, various levels [12:50.640 --> 12:55.240] in there. What you may not know and that's at least what we have heard so far was there [12:55.240 --> 13:00.400] is already Linux in use and also in certified product there's Linux use, but it's only [13:00.400 --> 13:09.520] on a very low safety level, so it's not on a very higher upper level of safety certification. [13:09.520 --> 13:13.800] What's an obvious thing if you see the member there is like 50 to 60 percent is from the [13:13.800 --> 13:20.120] field of automotive and therefore we have an automotive use case in there. If you drive [13:20.120 --> 13:25.280] a car, if you have a scooter or whatever, you may see sometimes that there's an oil [13:25.280 --> 13:30.280] pressure sign, oil temperature sign, check engine, whatever, basically when you put on [13:30.280 --> 13:34.640] the ignition you can see all these little LEDs and this is also the use case which we [13:34.640 --> 13:42.320] are using in the automotive workgroup. Basically what we said digital or cluster, instrument [13:42.320 --> 13:47.160] cluster, the speedometer, everything becomes digital, everyone has a display in your cars [13:47.160 --> 13:53.760] and that gives a good chance because there are more complex system in there, a lot of [13:53.760 --> 13:57.280] rendering, graphics rendering involved and this is actually a safety critical function. [13:57.280 --> 14:02.560] Even if you are in driving or in rear gear mode this has to be properly displayed and [14:02.560 --> 14:06.880] it has a safety criticality assigned. Also showing the check engine part is a safety [14:06.880 --> 14:16.680] criticality. The third group which we have is from the medical devices and here this [14:16.680 --> 14:21.160] is something from a completely different perspective while automotive has the commercial [14:21.160 --> 14:27.160] element in mind, maybe one to have cost savings, driving topics forward with the open APS, [14:27.160 --> 14:33.960] APS artificial pancreas system. It's driven by open source so there were open standards, [14:33.960 --> 14:38.480] there were chances to interact with your insulin pump and you see that this can become very [14:38.480 --> 14:42.960] uncomfortable. So there's a nice TED talk from Dan M. Lewis, I recommend this, I put [14:42.960 --> 14:48.680] the link also in the slide deck, you can download them and check it. You can see that you basically [14:48.680 --> 14:54.840] need to track your glucose level and certain dose of your insulin depending on your glucose [14:54.840 --> 15:00.440] level and this is also with warnings and so on and it's very basically event triggered [15:00.440 --> 15:06.400] so you see the blood pressure, sugar level goes up so you set the dose, it has a certain [15:06.400 --> 15:12.200] delay until it reacts and what came in here was to add the raspberry pi in the middle [15:12.200 --> 15:17.680] writing some scripting around it, getting it stabilized and create a product out of [15:17.680 --> 15:24.840] it and why I want to stress this is not to any IEC ISO certification done, it was done [15:24.840 --> 15:31.360] by an open source engineer, started this project and if you download this, if you use it, you [15:31.360 --> 15:37.120] use it on your own risk and therefore the work of Elisa was basically also the first [15:37.120 --> 15:40.840] use case we put directly in the beginning of the workshop to say let's take a deeper [15:40.840 --> 15:45.960] look, let's analyze what's in there, it's running for thousands of people, it has never [15:45.960 --> 15:51.320] been certified, they are very happy and they see it's increasing quality of their life [15:51.320 --> 15:56.000] but it's not certified, it's a safety critical product not certified and we are not targeting [15:56.000 --> 16:01.040] to do the direct certification of it in the first that we are looking into the different [16:01.040 --> 16:05.080] levels of the analysis, see what is involved, what workloads are in there, is there something [16:05.080 --> 16:10.960] which could make this fail, is there a risk in there, what potentially could go wrong? [16:10.960 --> 16:16.960] And this is basically the completion of the use cases and I've drawn this basically together [16:16.960 --> 16:21.240] as you can see an inner part which is very common for almost all the different projects [16:21.240 --> 16:27.680] which get fed by the use cases, feeding and say this is how you need to configure, how [16:27.680 --> 16:33.000] you need to specialize because you cannot create a full safety critical item completely [16:33.000 --> 16:36.960] out of a context, you cannot have this generic safety argumentation, you always need to judge [16:36.960 --> 16:44.120] it towards assumed context and this then turns into these are deliverables. [16:44.120 --> 16:49.840] A little bit on another view here, you can see also an exemplary system architecture [16:49.840 --> 16:55.680] mainly how we triggered it in the systems work group, it's not only Linux involved in [16:55.680 --> 17:01.080] these latest products, so if you come and you, of course in the medical devices open [17:01.080 --> 17:07.000] APS system, it's a pure aspirin on it, there is not the direct artist involved if you don't [17:07.000 --> 17:13.400] treat the sensor or the insulin pump as the artist next to it, but if you come to more [17:13.400 --> 17:17.480] complex products, you always need to face that there are artists involved, there are [17:17.480 --> 17:22.280] microcontroller, micro processes, container technology come into picture, everybody talks [17:22.280 --> 17:26.840] about containers and embedded these days, and also virtualization technologies that [17:26.840 --> 17:34.640] be Xen or that be KVM, so this is something which gets in there easily, and for this part [17:34.640 --> 17:41.440] if you see on the working group side this Linux features architecture code improvement, [17:41.440 --> 17:48.160] this directly go into the Linux work, so the main outcome of this is for the Linux ecosystem, [17:48.160 --> 17:53.240] the Linux kernel, and a lot of this work is also not directly related to the hypervisor [17:53.240 --> 17:59.800] or the artist, but there are things also which going a little bit further, like the tools [17:59.800 --> 18:04.200] and the engineering process, things which are coming out there may also have a good [18:04.200 --> 18:09.400] value for other projects which you build on, so if you have a Yocto involved in there, [18:09.400 --> 18:16.080] you can build Xen and Zafaya also with a meta layer, and then it may be good to have this [18:16.080 --> 18:22.240] tooling part in there, or also code improvements can come into picture there, certain tools [18:22.240 --> 18:27.400] which we make use of in UCI for testing, OpenQA system or others, this is an element [18:27.400 --> 18:32.000] to be considered here, and lastly the use cases for the completeness, they basically [18:32.000 --> 18:37.000] tailor down this system to whatever you need, so for example in the automotive work group [18:37.000 --> 18:42.160] we for now tailor the system down for getting a better Linux kernel understanding, and we [18:42.160 --> 18:46.560] get rid of the endear originally from the container, the virtualization the others, [18:46.560 --> 18:52.640] but we know once we have solved some parts of our work, we need to get the system context [18:52.640 --> 18:59.880] and the system context involve all these kind of things, right, and saying this, we also [18:59.880 --> 19:06.040] do a certain outreach to other projects, so I put in the Zafaya community, we have the [19:06.040 --> 19:11.400] automotive group Linux which is already in there, there could be other Linux versions, [19:11.400 --> 19:16.440] and also strong involvement of the Yocto project, instead I didn't know where to put the SPDX [19:16.440 --> 19:20.240] so probably on this picture where we see it later on. [19:20.240 --> 19:26.040] How we interact so far, we already are in discussions with Zafaya and Xan, we have weekly [19:26.040 --> 19:32.800] meetings also where Xan members pop up, where Zafaya is present with some representative, [19:32.800 --> 19:38.160] and we saw that these are safety critical open source projects, so they basically share [19:38.160 --> 19:42.640] the same burden, they need to show how the development process is done, how do we guarantee [19:42.640 --> 19:46.720] certain quality levels, so where is the testing done, where are the requirements, management [19:46.720 --> 19:53.280] and the traceability to everything, so this is something which pops in there quite good. [19:53.280 --> 19:58.320] If we take this architecture and as I'm coming from the automotive part, we have different [19:58.320 --> 20:02.960] projects which share these architectural sorts, and there is a large group on the Eclipse [20:02.960 --> 20:07.800] STV project, there is a SOFI initiative from ARM, basically having similar members like [20:07.800 --> 20:14.120] the STV and then we have a large automotive grade Linux, which also is so nice to provide [20:14.120 --> 20:18.640] us with the reference implementation for the automotive use case, so they share very similar [20:18.640 --> 20:19.640] architectures. [20:19.640 --> 20:26.240] Lastly, not directly related to safety, but having safety considerations in there and [20:26.240 --> 20:31.200] being part of the system is the Yocto project for some build tooling part, to get this into [20:31.200 --> 20:37.160] a CI reproducible, here for example the air bomb generation suddenly plays into the game, [20:37.160 --> 20:41.560] which you can do with the Yocto project, and while we were discussing we figured out that [20:41.560 --> 20:48.680] there is also like data needed into a system as bomb, and for this we reached out to the [20:48.680 --> 20:55.480] SPDX and there is actually a SPDX special interest group on FUSA meeting weekly to extend [20:55.480 --> 21:02.160] this scope, I guess there is also later on talk where parts of it get presented. [21:02.160 --> 21:06.560] Why do we do all this, I like this statement from George Bernard Shaw, he said if I have [21:06.560 --> 21:10.200] an apple and you have an apple, if we exchange the apple we have still one apple, but if [21:10.200 --> 21:14.960] I have an idea and you have an idea and we exchange these ideas and we have two ideas [21:14.960 --> 21:18.360] and that's basically where it goes about, we need to get a good understanding, we need [21:18.360 --> 21:25.840] to bring the things together, and by this we of course need to look into certain activities, [21:25.840 --> 21:32.080] so now we come into the part what the different work groups do, and if we check for example [21:32.080 --> 21:39.160] the elements, process, tools, documentation, not every work group acts in the same amount [21:39.160 --> 21:45.360] as the others do, so just put some bubbles in here to see where are mainly our work [21:45.360 --> 21:48.880] is going, so we have a lot of things of course on the software part, the people are interested [21:48.880 --> 21:54.280] in the Linux kernel, and the process part is maybe not so strong because it needs to [21:54.280 --> 21:59.920] be centralized and the usage of this process goes into the other work groups, so the OSAP, [21:59.920 --> 22:04.000] the medical part, architecture a little bit also, they work on these kind of processes [22:04.000 --> 22:09.400] and bring this into the other work groups, tools seem to pop out on multiple work groups [22:09.400 --> 22:15.600] because tools are handy, tools pop up, we bring it into the, into repo, you tell about [22:15.600 --> 22:20.000] it, get it used, and if we want to go into continuous certification at some point of [22:20.000 --> 22:24.960] time there will be a need of having a lot of tool support in there, and basically every [22:24.960 --> 22:30.400] work group does documentation, I want to give you some examples on this from the process [22:30.400 --> 22:35.480] perspective, there is a system theoretic process analysis, that the first topic I will tell [22:35.480 --> 22:39.960] a little bit more about, so it's the dry stuff about the systems architecture, it's not [22:39.960 --> 22:46.160] the code level on this, but we figured out when you do this kind of STPA analysis at [22:46.160 --> 22:49.960] some point of time you reach also a level where you need to understand more about the [22:49.960 --> 22:53.800] kernel, so I'll tell you something a little bit about the workload tracing which we have [22:53.800 --> 23:00.080] done, and also here supporting from the another work group, we have a call tree tool that's [23:00.080 --> 23:04.720] self, not in beta, utilizing tools and the proving thing, but writing something also [23:04.720 --> 23:11.640] from scratch, and this all then later on fits into the meta Eliza, which is basically the [23:11.640 --> 23:18.480] Yocto layer for the automotive use case enhancing the automotive grade Linux demo, we also did [23:18.480 --> 23:22.760] something without modification like the code checker implementations to Scala, I will not [23:22.760 --> 23:28.200] tell that much about it, but just to give some examples for work on, and all our information [23:28.200 --> 23:33.240] is public, so we have quite spread it up, there's a GitHub part, there's some parts [23:33.240 --> 23:37.520] on G Drive, we do regular blog posts, and have some white papers published, so it always [23:37.520 --> 23:43.000] depends on whom do you want to have as audience or readers, so we share this, there's also [23:43.000 --> 23:47.360] YouTube channel, but I don't judge this as documentation, okay. [23:47.360 --> 23:56.800] As at first, we're looking to STPA, so STPA stands for System Theoretic Process Analysis, [23:56.800 --> 24:01.440] what's interesting to see is if you're coming from safety criticality, maybe automotive, [24:01.440 --> 24:07.280] you know, hazard analysis, risk assessment, FMEAs, you may grow with watch, spreadsheets, [24:07.280 --> 24:12.920] drawing cases, checking your API interface and all these kind of things, and the nice [24:12.920 --> 24:16.920] thing about the STPA is you go a little more in a graphical approach, like on the left [24:16.920 --> 24:18.560] part of the picture. [24:18.560 --> 24:25.240] Some basics here, it's still relatively new, I say this because the old analysis part [24:25.240 --> 24:31.000] come from microcontroller worlds up down to the 60s, 70s, I guess 70s is more or less, [24:31.000 --> 24:34.600] so there was a long time where a lot of these analysis techniques came in and they haven't [24:34.600 --> 24:40.840] been much improved, but the systems which have been analyzed have increased complexity, [24:40.840 --> 24:47.120] and this is something which needs to be considered, and this System Theoretic Process Analysis [24:47.120 --> 24:50.640] STPA is able to handle very complex systems. [24:50.640 --> 24:56.360] The reason for this is that you can't start from a quite broad view, and maybe you don't [24:56.360 --> 25:00.200] know all the elements, so you have something, you just get a name for it, you don't know [25:00.200 --> 25:04.880] how it really looks like, and you have another blob where you have more details, so you can [25:04.880 --> 25:11.160] connect all these different blocks, and these analysis will still survive even if you know [25:11.160 --> 25:17.360] not the old block of some specific part yet, and then you will go in a very iterative approach [25:17.360 --> 25:21.480] and just go there step by step, you figure something out, you go to one level down, going [25:21.480 --> 25:24.920] deeper into the system, figure out that your assumption didn't hold true, so you do these [25:24.920 --> 25:30.840] kind of things for the analysis, and what's also good, if you have certain analysis, it [25:30.840 --> 25:35.440] basically looks on an API level, it looks under definitions or so, but this one explicitly [25:35.440 --> 25:41.840] goes on the system context, and it includes human interaction, the human operation, and [25:41.840 --> 25:45.880] this is also what's not there for other parts. [25:45.880 --> 25:48.880] In parallel, you directly get a good, while you do the analysis, you already improve your [25:48.880 --> 25:53.680] documentation, you get a good standing of the system, and you can even if you are in [25:53.680 --> 25:58.920] a QA department, so you can even integrate it properly with existing systems model-based [25:58.920 --> 26:00.920] approaches. [26:00.920 --> 26:06.160] The principles of it, to get the very, very high level, it's quite easy. [26:06.160 --> 26:14.520] There are four key elements, there's the controller on top, this one sends a control action to [26:14.520 --> 26:19.480] a controlled process, and this provides typically feedback. [26:19.480 --> 26:23.640] Well, that's not enough, in the end there's also important to know that the controlled [26:23.640 --> 26:29.920] process as such may also control something else, so that's how things get more grown [26:29.920 --> 26:35.560] up, and the question now in the end is, what could go wrong, what are unsafe control actions? [26:35.560 --> 26:43.560] You can use these methodology for maybe understanding how your water pipes flow in a building, or [26:43.560 --> 26:48.480] how people walk through certain, so you can always attach it to whatever use case you [26:48.480 --> 26:52.960] like, it's always the same approach, but for our case, and the main idea of it was for [26:52.960 --> 26:58.520] safety, criticality, for risk assessments, and that's why we say, let's look under unsafe [26:58.520 --> 26:59.520] control action. [26:59.520 --> 27:04.760] A little bit of warning, and the next slide is in a way that you will not read, it's [27:04.760 --> 27:10.440] level one analysis of this open APS use case, and well, yeah, that's how it looks like, [27:10.440 --> 27:16.200] in the middle there's the open APS system, you have a view from a top level, so it's [27:16.200 --> 27:20.680] a developer view, it's not the full user view here, so you have infrastructure, people [27:20.680 --> 27:24.960] have algorithm developer, you release the software, then they come to the human operator [27:24.960 --> 27:30.560] who uses the software, installs it further on, this goes in the system, we don't know [27:30.560 --> 27:34.160] yet what the system is, this is what I meant with the very first level, you don't care [27:34.160 --> 27:39.400] if it's the Linux system or whatever it's underneath, so this is my open APS system, [27:39.400 --> 27:43.600] and when you have understood what is your critical part in there, how the system context [27:43.600 --> 27:50.520] looks like, you may go into the next level, and now we zoom in into this open APS system, [27:50.520 --> 27:55.200] and go on the next level, and in this you see there is actually a Raspberry Pi involved, [27:55.200 --> 28:00.200] we know this from the hardware part, and the OS in there, so it's a Raspbian, you have [28:00.200 --> 28:06.480] an open APS toolkit involved, the actual algorithm, this may control the insulin pump, the night [28:06.480 --> 28:11.280] scout part is also an external component, you see all these kind of things, and the [28:11.280 --> 28:15.920] work group has been on this level for some time, and then try to write down the next [28:15.920 --> 28:23.040] level going deeper, and then actually needed support, so that's where workload tracing [28:23.040 --> 28:28.680] came into picture, we used the mentorship project here and had support, so someone fully [28:28.680 --> 28:32.800] concentrating on the activity of workload tracing, that's another little table which [28:32.800 --> 28:39.120] you can at least read, therefore the main things to be known as, we use S-trace and [28:39.120 --> 28:43.640] C-scope as the main tools for the analysis, there are stressors in there, like Stress [28:43.640 --> 28:47.760] and G, Pax test and other parts, this may depend on your workload which you use, what [28:47.760 --> 28:52.440] you would challenge with the system, and in this one the information which is coming in [28:52.440 --> 28:57.360] there, now our system calls, how often are these system calls coming in the frequency [28:57.360 --> 29:02.760] of it, which subsystem do they belong to, that you know okay, where is my critical parts, [29:02.760 --> 29:06.120] where is the system call entry point, and by this you can more deep dive into the different [29:06.120 --> 29:10.960] system, and this causes a lot of refinement into the upper layers, again because now you [29:10.960 --> 29:15.480] have iteration and see maybe you have a wrong assumption, but still before everything was [29:15.480 --> 29:20.880] correct as you understood, no you just improve it, related to this calls of the call tree [29:20.880 --> 29:27.640] tool, that's something basically rewritten and own part, so the idea was to see here [29:27.640 --> 29:33.720] is a system call, what else, of course what are the ways, how to interact there, how to [29:33.720 --> 29:37.320] visualize things, because if you just see something and go through the code you cannot [29:37.320 --> 29:42.960] really grab the complexity, and this was just the first shot, so also here it's not worse [29:42.960 --> 29:48.000] to read, but you can see there's a file system part, and the very interesting part is, this [29:48.000 --> 29:53.400] is quite a static thing, so you will see all the potential options, while in the previous [29:53.400 --> 29:58.040] view if you have a call, if you have the workload tracing, you basically see where has the pass [29:58.040 --> 30:03.240] gone, but you don't directly uncover the untraced passes, and here you see all the passes, [30:03.240 --> 30:06.720] but you have the chance that you meet something completely irrelevant, because you're not [30:06.720 --> 30:12.960] on this with your workload, and this is a complimenting element of this, and well you [30:12.960 --> 30:17.040] get a good insights on the kernel construction, and it can help you to analyze more workload [30:17.040 --> 30:27.600] in there, right, we bring all these things together in the meta-eliter instrument cluster, [30:27.600 --> 30:32.920] it looks like the AGL instrument cluster, we saw this picture before, I highlighted the [30:32.920 --> 30:38.200] change which we did, we write danger in there, and this made us the whole thing safe, which [30:38.200 --> 30:44.080] well is of course not the full story, the full story is that we just needed a use case [30:44.080 --> 30:49.720] to which we can analyze, which has safety relevance, and it was a good QT-based demo, [30:49.720 --> 30:55.480] so we could make use of it, it was running on QAML, QAML has a little drawbacks on this, [30:55.480 --> 31:01.800] I'll come to this very soon, but with this you can start analysis tracing workloads, [31:01.800 --> 31:10.880] and also add a watchdog mechanism, watchdog would be the next part of it, basically what [31:10.880 --> 31:15.960] we use in a lot of concept is an external watchdog, even if you don't see it directly [31:15.960 --> 31:20.640] in the open APS system for example, there's still an external monitoring involved which [31:20.640 --> 31:25.920] gives emergency data, if the Raspberry Pi would do something wrong in the wrong or the [31:25.920 --> 31:30.160] other direction, not that it happens, but there is a monitor there which controls, which [31:30.160 --> 31:36.680] will give a beep or so and inform the user, similar you do it in the automotive case where [31:36.680 --> 31:42.320] you have this telltale environment and you want to have something which is traced in [31:42.320 --> 31:50.280] your workload, so yeah, this challenge response watchdog, challenge response basically it's [31:50.280 --> 31:56.040] not simply looking for something but it gives a little challenge to the workload while the [31:56.040 --> 32:01.200] workload process other parts and it gets a response in there so that you know, okay, [32:01.200 --> 32:06.920] yeah, that's really alive and it's not just replying and the demand here comes basically [32:06.920 --> 32:13.240] that we, for a lot of use cases, cannot fully guarantee that the workload comes in the proper [32:13.240 --> 32:17.920] time that the process doesn't hang and this release a lot of responsibility from you by [32:17.920 --> 32:22.280] checking this with an external workload, so it's mainly looking into the safety critical [32:22.280 --> 32:27.000] workload, I know there are ideas to say well let's put this watchdog thing and let's watch [32:27.000 --> 32:31.720] everything in there, this typically doesn't work out, so you really concentrate on the [32:31.720 --> 32:38.880] things and say this is safety critical and all the other parts are related to user experience, [32:38.880 --> 32:44.080] so if you're drawing rendering engine, God's lucky and you see a lot of delay and touch [32:44.080 --> 32:48.840] screen or whatever, that's nothing which you want to experience from a user perspective [32:48.840 --> 32:53.560] but as long as the warning signs come in time and in proper from a safety perspective, this [32:53.560 --> 32:58.960] is all fine, so it's good to split up here between what is the intended functionality, [32:58.960 --> 33:04.160] what is the safety criticality of it, what do I need to monitor and what not and for [33:04.160 --> 33:08.920] this, this is just the safety net in there, here I said this is used widely in automotive, [33:08.920 --> 33:12.800] there are other industries basically always have your safety net somewhere around which [33:12.800 --> 33:19.000] monitor things and what we try to do is we want to get more responsibility to Linux [33:19.000 --> 33:27.920] and by this you can start with a lot of elements in this safety critical part and yeah, so [33:27.920 --> 33:34.680] that's the main thing on this part and the last message is very important for me, it's [33:34.680 --> 33:41.880] not that you consider your watchdog in this design as being there or need to be there, [33:41.880 --> 33:46.760] you basically start creating your system that you never need to trigger the watchdog because [33:46.760 --> 33:51.640] you don't want this, this is just your system functionality and it has to work and in best [33:51.640 --> 33:57.240] case this gets not triggered into a safe state, for TELTA use case for example this could [33:57.240 --> 34:04.080] mean that the screen is turned off or that you do a restart, basically you would maybe [34:04.080 --> 34:08.800] make a black screen or so that people directly recognize the driver, oh it's not going right [34:08.800 --> 34:13.360] here, it could be also be the warning message or what else but depending on what's your [34:13.360 --> 34:17.400] safety process you need to make sure that this is really also triggered so their safety [34:17.400 --> 34:20.360] criticality comes in picture again. [34:20.360 --> 34:26.560] I prepared a one minute video but I never know how these kind of things properly work [34:26.560 --> 34:33.360] if you do a demonstration so I just put the YouTube link on the material and if you are [34:33.360 --> 34:40.120] brave enough or even not, I guess it's a straightforward thing, we have a good documentation [34:40.120 --> 34:47.840] how to experience this demo because when we started with the ELISA work we saw that we [34:47.840 --> 34:53.160] basically start building our topics from scratch, we documented everything right good as best [34:53.160 --> 34:58.000] understanding and then someone came and said well but I'm not using Ubuntu, I'm using an [34:58.000 --> 35:03.600] open SUSE tumbleweed and we figured we need a little bit more maybe that we have more [35:03.600 --> 35:08.200] environments set up that people can reproduce things so we came up with a docker container [35:08.200 --> 35:12.760] which basically gets the things packages installed which you need the right version of it to [35:12.760 --> 35:18.080] make it easier for people then the next thing we observed was oh okay the people do a yachtable [35:18.080 --> 35:23.400] it consumes a lot of space and a lot of compilation, maybe the cache binaries would be a good [35:23.400 --> 35:30.560] option and so we also enabled the estate in there so that you cannot build like in the [35:30.560 --> 35:36.040] parts which are still buildable or needed to be built in roughly 40 minutes on a poor [35:36.040 --> 35:40.240] laptop, it basically depends on your download speed also right it's quite a amount of download [35:40.240 --> 35:44.680] which you typically have with the yachtable, on the long one we also see if we can extend [35:44.680 --> 35:49.160] it to other systems and maybe also Debian version of it or so but for now it's the yachtable [35:49.160 --> 35:53.280] the last thing which we figured out there are also use cases maybe where you want to [35:53.280 --> 35:57.480] deep dive into the system and this would be the complimenting part to this demo if you [35:57.480 --> 36:01.960] don't want to see the video and you want to just try it out directly if you have QM [36:01.960 --> 36:08.400] on your system installed just download the binaries directly they get built nightly [36:08.400 --> 36:13.000] so really nightly so every night you get a new one it always goes to the latest version [36:13.000 --> 36:18.160] of the AGL with a little bit of problems last week but it's up and running again does a [36:18.160 --> 36:21.800] boot check does a boot check so that you can really experience it and it basically uses [36:21.800 --> 36:28.160] the instructions which are written down in the github readmemarkdown file right yeah [36:28.160 --> 36:35.760] this is about this some next steps the STPA is continued so we're getting into deeper [36:35.760 --> 36:41.680] levels of it we need to see that we get the workload tracing properly reflected in the [36:41.680 --> 36:46.200] different diagrams this was heavily driven by the medical devices where the automotive [36:46.200 --> 36:51.520] has not used the workload tracing that much but we bring this in there the call tree also [36:51.520 --> 36:58.920] got extended with another tool which was KS called KS enough does certain kernel static [36:58.920 --> 37:04.480] navigation tool so to get a better analysis on better view on this there for the meter [37:04.480 --> 37:09.200] Eliza as I was talking about QM where everybody wants to see real hardware so we also are [37:09.200 --> 37:14.720] in the past on bringing this on an ARM based hardware for now so we have the 86 and QM [37:14.720 --> 37:20.000] simulation and an ARM underneath is mainly driven by systems workgroup and what is very [37:20.000 --> 37:24.880] important so far this display checking in there so we are not normally would check what [37:24.880 --> 37:28.240] the rendering of a telltale but there's so many different kind of implementation so [37:28.240 --> 37:33.000] that we mock a lot of things there and we want to improve this so that we have proper [37:33.000 --> 37:38.600] display checks and also a lot of monitoring this is basically on the four topics which [37:38.600 --> 37:47.400] we have seen additionally we work on a system as bomb we enabled the as bomb part for generating [37:47.400 --> 37:53.640] material in the demo we want to improve kernel configuration trimmed on the size of the image [37:53.640 --> 38:00.200] then have the RT documentation updated have more complex cluster than we're involved and [38:00.200 --> 38:07.120] that's may need so summarizing what you have seen we talked about the challenges in the [38:07.120 --> 38:12.000] beginning basically what the difference between the traditional safety critical artist and [38:12.000 --> 38:16.800] the new one what this is what the collaboration can and what cannot achieve you heard about [38:16.800 --> 38:22.360] the goals and the way of the strategy which tools we analyze or which which elements we [38:22.360 --> 38:28.200] looked into and also then you could see how the different workgroups interacted how they [38:28.200 --> 38:34.320] put into a system how we all reach to wider community parts I talk about the contributions [38:34.320 --> 38:40.480] of the different workgroups which shared with the community also in form of usable use case [38:40.480 --> 38:49.000] downloadable then you could see methodologies of our STPA workload tracing and lastly we [38:49.000 --> 38:55.600] got a little review on what's coming next and I guess we're good from the time from [38:55.600 --> 39:13.440] the questioning part it does anyone have a question says one above coming down you have [39:13.440 --> 39:25.600] a question okay thanks for the interesting talk you mentioned certification as one big [39:25.600 --> 39:35.040] problem so where can we improve things so that certification processes become more open [39:35.040 --> 39:40.480] source friendly and open source software becomes more certification friendly so what has to [39:40.480 --> 39:47.760] be done or can be done there yeah I guess some part from the city asking how can open source [39:47.760 --> 39:52.880] and certification come closer to each other from both sides right and one thing could [39:52.880 --> 39:59.400] be for example done in the documentation and improving tracing down having tools supporting [39:59.400 --> 40:04.280] how do certain features get from the mailing list into the system if there's a test around [40:04.280 --> 40:10.440] it so this gives a lot of confidence and trust in what it's doing from another perspective [40:10.440 --> 40:15.720] there's not much in the safety integrity standards which allow the usage of pre-existent software [40:15.720 --> 40:20.880] elements so for this there's also an isoparse currently which allows more usage I mean depends [40:20.880 --> 40:25.240] on the safety standard which you're in if you're some relaxed medical standards less [40:25.240 --> 40:31.160] requirements on this but for automotive it's very strong and prohibitive on this so I would [40:31.160 --> 40:36.560] say doing careful work and explaining design decisions and so on making this visible and [40:36.560 --> 40:40.920] more structured having maybe centralized bug tracking and so on this this can help a lot [40:40.920 --> 40:45.800] from this perspective it will be good for the certification authorities and we do a lot [40:45.800 --> 40:56.720] of clearance also yeah I guess if I heard you correctly said from supporting the assessments [40:56.720 --> 41:03.280] and the sororities in there we also have company support where we really are in the working [41:03.280 --> 41:06.960] groups and get from certification authorities input in the continuous work which we are [41:06.960 --> 41:16.360] doing so they are directly working within the work groups as well yeah chin as well thank [41:16.360 --> 41:20.600] you very much for your talk I had a just a quick question I want to get a feel for what [41:20.600 --> 41:28.080] your opinion on on this is do you think there's space as a certification for for something [41:28.080 --> 41:32.320] like Linux improves can you move the mic a little closer because it's for me I hear the [41:32.320 --> 41:41.400] people louder leaving so just a little sorry yeah oh oh wow yes the difference as as as [41:41.400 --> 41:47.760] process for certification and for validation of Linux kind of improve and and change over [41:47.760 --> 41:52.320] time do you think there's ever going to be space for for Linux to be used in in kind [41:52.320 --> 41:57.480] of a critical component on vehicles or do you think that space is completely reserved [41:57.480 --> 42:04.640] for for something that's actually using real time the main part which I heard was if there's [42:04.640 --> 42:16.080] I got the real time part in the end yeah like do you think there is it's already there fair [42:16.080 --> 42:27.960] thank you okay it wasn't there does anyone else have a question you have a question yeah [42:27.960 --> 42:34.760] so what is the place for Linux itself in let's say what's the safety and integrates [42:34.760 --> 42:42.560] a level of Linux itself in this model because if we take let's say ISO 26262 there's a v-model [42:42.560 --> 42:49.680] requirements for development this but Linux already has source code there are no you know [42:49.680 --> 42:55.880] there is no coverage this test with all these mcdc coverage etc etc so what's the place of [42:55.880 --> 43:03.720] Linux and how to keep it maintain it without forking yeah so you say where's the space [43:03.720 --> 43:08.520] in the place of Linux if you see the v-model for example the ISO 26262 where does things [43:08.520 --> 43:16.640] fit in there a lot of demands like car car coverage parts tracing and so on so what you [43:16.640 --> 43:22.440] can see is that first of all speaking about a level you will not directly go to an asl [43:22.440 --> 43:26.560] d level which puts much more requirement on the tools that's for sure so you should start [43:26.560 --> 43:32.240] on the lower asl a b level that's also what we did we relaxed some parts also for automotive [43:32.240 --> 43:37.240] cases let's don't start with too complex parts maybe get a real time criticality out there [43:37.240 --> 43:45.680] because then you have to review much more parts and so the space which I see is that [43:45.680 --> 43:51.280] you should argue equivalence for certain things that you are in close collaboration with assessors [43:51.280 --> 43:57.480] and explain how things are done because when the ISO was originally prepared it was not [43:57.480 --> 44:02.400] considering a complex system as Linux being in use and the large amount of pre-existing [44:02.400 --> 44:07.680] software so from this if you are in an assessment if you are there if you can show and show [44:07.680 --> 44:13.640] the credibility by requirements work by good concepts you may in the first and come up [44:13.640 --> 44:20.360] which to a system which is arguable to be safe but not directly certifiable to your [44:20.360 --> 44:26.760] ISO 262 part but this already showed you the perfect discussion room also right because [44:26.760 --> 44:32.800] then you see well you cannot tell me this is not working but you still say it's not [44:32.800 --> 44:37.880] certifiable and then you see also the glitch of the standard and if you reach this point [44:37.880 --> 44:41.800] you have a lot of good support when you go with certification authorities early if you [44:41.800 --> 44:46.160] have internal assessments and you can judge it and in the end it's also your responsibility [44:46.160 --> 44:51.360] where you say oh I argue for an equivalence because it's not saying in this spec you have [44:51.360 --> 44:56.160] to assess recommended highly recommended leaving you also trace for showing equivalence [44:56.160 --> 45:00.880] to this model I'm using this and on top I'm adding this and by this you can get an argument [45:00.880 --> 45:04.720] and of course getting feedback from your developers that the work which you're doing also into [45:04.720 --> 45:08.920] kernel mainline and so on. [45:08.920 --> 45:17.960] So maybe also it's possible to somehow affect how ISO 262 is developed because it's a bit [45:17.960 --> 45:22.480] outdated in some way. [45:22.480 --> 45:27.360] Some of the members in ELISA have people in these ISO committees that are basically taking [45:27.360 --> 45:31.360] it back into that direction for the future revs of the standards. [45:31.360 --> 45:34.240] We don't have visibility at least I don't because I'm not in those committees but we [45:34.240 --> 45:40.160] do know that some of those member companies you saw up there are there and they are advocating [45:40.160 --> 45:49.320] for things to work a little bit better in future revs. [45:49.320 --> 45:54.600] Is there anyone else who has a question? [45:54.600 --> 46:19.920] Okay thank you for your talk. [46:19.920 --> 46:27.760] Thank you very much.