[00:00.000 --> 00:15.440]  Okay. There's next talk. Please be silent.
[00:15.440 --> 00:21.400]  Okay. This will get an interesting thing. Normally I'm used to move my arms a lot while
[00:21.400 --> 00:28.040]  I'm talking, so I try to get the microphone always close to my body now. I will give you
[00:28.040 --> 00:33.000]  some information about the ELISA project. ELISA stands for Enabling Linux and Safety
[00:33.000 --> 00:40.960]  Applications. And maybe a quick question up front. Who is aware of safety critical software?
[00:40.960 --> 00:48.480]  So shortly raise the hand. Hi. That's good. Maybe 25, 30%. I hope you will also learn
[00:48.480 --> 00:58.360]  something new then. So maybe before we start fully, I just give you a short view on which
[00:58.360 --> 01:03.200]  project context I'm working. So as you can see, my project is mainly focusing on embedded
[01:03.200 --> 01:09.040]  IoT Linux at Bosch. And what you try to do is utilizing a lot of open source projects,
[01:09.040 --> 01:15.200]  see how they fit into a landscape, and can be of value for very, very different device
[01:15.200 --> 01:21.200]  classes because normally you don't believe it, but in all of these kind of products you
[01:21.200 --> 01:28.880]  will find Linux in there or also embedded real-time OS and so on. So that's all about
[01:28.880 --> 01:36.120]  this part, shortly about myself. Who am I? I'm a technical business development manager.
[01:36.120 --> 01:41.680]  I'm focusing on embedded open source, mainly doing this for the Bosch. And in parallel,
[01:41.680 --> 01:46.080]  that's also why I'm speaking here. I'm the technical steering committee chair and working
[01:46.080 --> 01:52.600]  group lead for the Linux foundation. I bring a past history of 15 years plus. I guess I
[01:52.600 --> 01:58.800]  started with Ubuntu 6.10 more or less to set it up on old PCs, sharing it to exchange students
[01:58.800 --> 02:05.280]  so like a distributed hub of PCs. And here since 10 years, I'm more or less in the automotive
[02:05.280 --> 02:12.000]  space with Linux. We had our first product with 2.6 kernel out. And I guess now we can
[02:12.000 --> 02:19.800]  start on the real things. So if we talk about Linux in safety critical systems, we first
[02:19.800 --> 02:26.000]  need to get an understanding what the system really means. And a critical system, maybe
[02:26.000 --> 02:31.560]  first you say assessing whether the system is safe requires understanding sufficiently.
[02:31.560 --> 02:35.920]  And you can see here's nothing about Linux in there because the system always goes beyond
[02:35.920 --> 02:43.240]  the scope of a pure operating system, beyond maybe a single component. And in this one,
[02:43.240 --> 02:50.080]  you have a system context in which Linux plays a role. And you need to understand the system
[02:50.080 --> 02:55.400]  context and how this is used. Because if you don't get the understanding how Linux operates,
[02:55.400 --> 03:00.320]  you cannot see in which components you're interested. And which features you may need
[03:00.320 --> 03:06.760]  or which not. And then you can evaluate what kind of these features are really relevant
[03:06.760 --> 03:15.720]  for safety. And while you're doing so, you're most likely identify gaps that exist and you
[03:15.720 --> 03:22.200]  will definitely need more and more work to get this done. So if you look into the Linux
[03:22.200 --> 03:29.000]  ecosystem, which we have already, there's a good chance or a good reason also to take
[03:29.000 --> 03:35.080]  Linux because there is a large variety of devices. The ecosystem is strong. You have
[03:35.080 --> 03:40.840]  good tools around this. An incredible amount of hardware support. It runs on many, many
[03:40.840 --> 03:48.080]  devices. And also very important, you have a broad set of experts in there. If you see
[03:48.080 --> 03:55.560]  what's sometimes taking as the benefit of a certified safety critical OS, often it comes
[03:55.560 --> 03:59.840]  with hard real-time requirements and capabilities. We know that the preempt or t-patches are
[03:59.840 --> 04:06.160]  in good shape in the kernel. But, well, hard real-time maybe goes even further down the
[04:06.160 --> 04:12.960]  road. And then there is a development process. And if you see these two sites, if you come
[04:12.960 --> 04:18.360]  and want to address very complex products like in the automotive field or maybe you
[04:18.360 --> 04:23.480]  can even call your robot vacuum cleaner a more complex product. And then you come from
[04:23.480 --> 04:27.960]  two perspectives. On the one side, you could go with a traditional small component-driven
[04:27.960 --> 04:33.280]  artist and you have to handle all the complexity. So you need to have more hardware involved.
[04:33.280 --> 04:37.920]  You have more multi-core support. Suddenly, not everything works out there. Or you go
[04:37.920 --> 04:41.720]  the other way around and you come with a Linux where you all have these kind of things, but
[04:41.720 --> 04:44.920]  you need to improve and see what do we do about the development process? What do you
[04:44.920 --> 04:49.640]  do about the real-time capabilities and so on? So anyway, when you build a more complex
[04:49.640 --> 04:57.000]  product, you need to find a way to tackle these kind of challenges and also bring the
[04:57.000 --> 05:02.680]  difference closer to each other. While we were looking at Linux, I'll take the part
[05:02.680 --> 05:06.600]  in the beginning. It's a little bit like a disclaimer and a little bit more text. In
[05:06.600 --> 05:11.320]  this collaboration of Elisa, we said we cannot engineer your system to be safe. We're talking
[05:11.320 --> 05:17.920]  about functional safety, not about cybersecurity. But if we just take this example, there's
[05:17.920 --> 05:21.440]  always a strong risk also that you have security breaches in your system. So it's similar
[05:21.440 --> 05:27.440]  here also with safety. If you build a system, it's still your responsibility for it. And
[05:27.440 --> 05:32.880]  just because we provide certain guidelines or engineering principles and so on, it's
[05:32.880 --> 05:39.000]  still in your responsibility as someone producing a product to make things safe. And also that
[05:39.000 --> 05:45.960]  way to make sure you really have the described processes in use, use the methodologies and
[05:45.960 --> 05:51.160]  one of the core questions which typically come is like, oh, so you're from Elisa, you
[05:51.160 --> 05:56.280]  make a safe Linux, will you certify a kernel version? And that's not what will work because
[05:56.280 --> 06:00.760]  we all know you have to move forward. There's continuous improvement, there's sink and there
[06:00.760 --> 06:04.960]  vulnerabilities fixed, so you need to go on. And this gives an additional challenge with
[06:04.960 --> 06:09.320]  the continuous certification. So we will definitely not have a version and we will also not certify
[06:09.320 --> 06:15.400]  Linux in this project. We just give the tools and other elements in there. So here, last
[06:15.400 --> 06:19.880]  part of it, there's still responsibility, legal applications, liability and so on, which
[06:19.880 --> 06:25.120]  is also near all. Nevertheless, we find a good set of partners already, which are willing
[06:25.120 --> 06:30.120]  to support this mission. And they subscribe and say we would like to bring the whole
[06:30.120 --> 06:37.760]  thing forward. And seeing this, there is the mission statement which we have drawn. It's
[06:37.760 --> 06:42.200]  lengthy basically. You can read that there's set of elements, processes, tools. It should
[06:42.200 --> 06:47.720]  be amenable to safety certification. We look into software and documentation development
[06:47.720 --> 06:52.000]  and in the end that we aid the development, deployment, operation, or the adoption of
[06:52.000 --> 07:01.560]  a project into another project. Okay. So if you look at this mission, you see basically
[07:01.560 --> 07:06.560]  four key parts, which we will also talk later about. You have elements and software, which
[07:06.560 --> 07:13.240]  is concrete implementation of what we're doing. And you also have the processes. A development
[07:13.240 --> 07:18.160]  process always falls into safety critical, into security system, wherever you look at.
[07:18.160 --> 07:23.280]  And if you start to automate things, if you would like to analyze, there's always a strong
[07:23.280 --> 07:30.280]  involvement of tools in there. And the last thing is when you do all this kind of work,
[07:30.280 --> 07:35.760]  you need to document it. And actually, there's a lot of documentation work needed in any
[07:35.760 --> 07:46.160]  place. So how will we do all this kind of things? We take it in our ELISA working groups.
[07:46.160 --> 07:50.840]  We split this depending on different topics, on different contexts. They're growing depending
[07:50.840 --> 07:59.160]  on demands of certain sizes reached. We're extending this. And if we take a first look,
[07:59.160 --> 08:04.360]  we have a safety architecture work group. This is a group which actively looks inside
[08:04.360 --> 08:11.240]  the kernel and takes, for example, a watchdog subsystem because watchdog is one of the crucial
[08:11.240 --> 08:16.520]  elements which we have in use. It looks what are potential safety related functionality.
[08:16.520 --> 08:20.200]  Is there something in the kernel which is non-safety related? How would these kind of
[08:20.200 --> 08:26.720]  things interfere? And by this, the safety architecture work group does a lot of analysis,
[08:26.720 --> 08:33.080]  try to improve documentation in the kernel, provide new tools. So that's a strong set
[08:33.080 --> 08:40.640]  in there, basically driven by use cases and demands of products. And a little more broader
[08:40.640 --> 08:47.280]  approach is brought in by the Linux features. And actually, the full name is Linux features
[08:47.280 --> 08:51.760]  for safety critical systems. So it's not about generic features. It's about the safety
[08:51.760 --> 08:58.080]  criticality part in there. You can imagine this a little bit like if you're familiar
[08:58.080 --> 09:03.960]  with security measures like namespaces or other parts that we're looking for elements
[09:03.960 --> 09:10.960]  in here which could improve safety. So which means if you take this special kernel configuration,
[09:10.960 --> 09:16.320]  a feature, turn it off on whatever you do and say, okay, this will come up as a blueprint.
[09:16.320 --> 09:21.280]  This is something how you better work with memory, how you not work with memory. All
[09:21.280 --> 09:27.000]  these kind of things are tackled in the Linux features. And then it's a nice group because
[09:27.000 --> 09:32.560]  with the results which are in there, if you're already in a process of enhancing Linux and
[09:32.560 --> 09:36.040]  don't want to wait for all the results of the use cases work group and so on, you can
[09:36.040 --> 09:41.640]  have incremental steps here, just take some part of it and make your system more robust,
[09:41.640 --> 09:45.880]  more dependable. And you can also judge it against how does it compare to securities
[09:45.880 --> 09:51.480]  things which you're doing. And so here that's the big value of this group. It's more on
[09:51.480 --> 09:59.200]  a direct use base and serving a long term safety argumentation, but not that it's something
[09:59.200 --> 10:06.000]  which develops for years. So it's basically assess what's there. As also the improvement
[10:06.000 --> 10:11.120]  of code quality is very important. We have tools investigation code improvement work
[10:11.120 --> 10:15.880]  group. The code improvements could be, for example, done with doing fuzzy testing on
[10:15.880 --> 10:23.520]  the kernel using tools like code checker or syscaller. And then bring them also into a
[10:23.520 --> 10:30.600]  setup where we have server kind of a CI which runs on Linux next or whatever kernel configuration
[10:30.600 --> 10:36.080]  to identify issues to get the kernel more robust, more dependable, reliable and serve
[10:36.080 --> 10:42.920]  also in the argumentation about the quality of the kernel. And what was also on the right
[10:42.920 --> 10:46.680]  side and some of the challenges part was on the engineering process. And as you know,
[10:46.680 --> 10:52.160]  there are rigorous methods within the kernel development. So there are a lot of reviews.
[10:52.160 --> 10:57.920]  Patches are rejected and you see that there's strong demand from traditional project management
[10:57.920 --> 11:04.520]  when it comes to safety products and not every process complies with it directly. So we need
[11:04.520 --> 11:10.440]  to find an argumentation. How is there an equivalence to the open source development
[11:10.440 --> 11:18.400]  process compared to what, for example, ISO 26262 requests for automotive products?
[11:18.400 --> 11:23.600]  On top what is very interesting to understand here is also that if we look into open source,
[11:23.600 --> 11:32.640]  you basically cannot easily buy a maintainer or developer there. So you cannot buy features
[11:32.640 --> 11:39.560]  directly or so you get more an unbiased view or maybe a personal view, but a maintainer
[11:39.560 --> 11:45.480]  who is really committed for the component for this power subsystem of the kernel and
[11:45.480 --> 11:49.640]  so on. And with this strong commitment, for example, you already fulfill a little bit
[11:49.640 --> 11:54.600]  of independent view because in safety systems, whenever it comes later on, the developer
[11:54.600 --> 11:59.600]  needs to commit to what has been done. But of course, it's not written down. It's not
[11:59.600 --> 12:04.760]  written down. The maintainer fully commits to whatever it does. So this is some part,
[12:04.760 --> 12:09.720]  for example, where you can start argumenting on it. And as the different elements need
[12:09.720 --> 12:14.000]  to get somewhere and need to be visible, we figured this out because we were running
[12:14.000 --> 12:18.400]  quite in parallel with different streams on this, but never brought this forward. We came
[12:18.400 --> 12:24.040]  up with the systems workgroup and the system workgroup actually should take all these different
[12:24.040 --> 12:29.800]  elements, bring them together, works cross-functional and maybe even cross-project and combine the
[12:29.800 --> 12:36.920]  elements. In order to tailor the system properly, we have vertical use cases, a newly created
[12:36.920 --> 12:41.640]  one, so there's not much information in this presentation about the aerospace workgroup
[12:41.640 --> 12:45.800]  yet. The overall idea is it should address everything which flies and you know that in
[12:45.800 --> 12:50.640]  aerospace there are many safety standards, safety integrity standards, various levels
[12:50.640 --> 12:55.240]  in there. What you may not know and that's at least what we have heard so far was there
[12:55.240 --> 13:00.400]  is already Linux in use and also in certified product there's Linux use, but it's only
[13:00.400 --> 13:09.520]  on a very low safety level, so it's not on a very higher upper level of safety certification.
[13:09.520 --> 13:13.800]  What's an obvious thing if you see the member there is like 50 to 60 percent is from the
[13:13.800 --> 13:20.120]  field of automotive and therefore we have an automotive use case in there. If you drive
[13:20.120 --> 13:25.280]  a car, if you have a scooter or whatever, you may see sometimes that there's an oil
[13:25.280 --> 13:30.280]  pressure sign, oil temperature sign, check engine, whatever, basically when you put on
[13:30.280 --> 13:34.640]  the ignition you can see all these little LEDs and this is also the use case which we
[13:34.640 --> 13:42.320]  are using in the automotive workgroup. Basically what we said digital or cluster, instrument
[13:42.320 --> 13:47.160]  cluster, the speedometer, everything becomes digital, everyone has a display in your cars
[13:47.160 --> 13:53.760]  and that gives a good chance because there are more complex system in there, a lot of
[13:53.760 --> 13:57.280]  rendering, graphics rendering involved and this is actually a safety critical function.
[13:57.280 --> 14:02.560]  Even if you are in driving or in rear gear mode this has to be properly displayed and
[14:02.560 --> 14:06.880]  it has a safety criticality assigned. Also showing the check engine part is a safety
[14:06.880 --> 14:16.680]  criticality. The third group which we have is from the medical devices and here this
[14:16.680 --> 14:21.160]  is something from a completely different perspective while automotive has the commercial
[14:21.160 --> 14:27.160]  element in mind, maybe one to have cost savings, driving topics forward with the open APS,
[14:27.160 --> 14:33.960]  APS artificial pancreas system. It's driven by open source so there were open standards,
[14:33.960 --> 14:38.480]  there were chances to interact with your insulin pump and you see that this can become very
[14:38.480 --> 14:42.960]  uncomfortable. So there's a nice TED talk from Dan M. Lewis, I recommend this, I put
[14:42.960 --> 14:48.680]  the link also in the slide deck, you can download them and check it. You can see that you basically
[14:48.680 --> 14:54.840]  need to track your glucose level and certain dose of your insulin depending on your glucose
[14:54.840 --> 15:00.440]  level and this is also with warnings and so on and it's very basically event triggered
[15:00.440 --> 15:06.400]  so you see the blood pressure, sugar level goes up so you set the dose, it has a certain
[15:06.400 --> 15:12.200]  delay until it reacts and what came in here was to add the raspberry pi in the middle
[15:12.200 --> 15:17.680]  writing some scripting around it, getting it stabilized and create a product out of
[15:17.680 --> 15:24.840]  it and why I want to stress this is not to any IEC ISO certification done, it was done
[15:24.840 --> 15:31.360]  by an open source engineer, started this project and if you download this, if you use it, you
[15:31.360 --> 15:37.120]  use it on your own risk and therefore the work of Elisa was basically also the first
[15:37.120 --> 15:40.840]  use case we put directly in the beginning of the workshop to say let's take a deeper
[15:40.840 --> 15:45.960]  look, let's analyze what's in there, it's running for thousands of people, it has never
[15:45.960 --> 15:51.320]  been certified, they are very happy and they see it's increasing quality of their life
[15:51.320 --> 15:56.000]  but it's not certified, it's a safety critical product not certified and we are not targeting
[15:56.000 --> 16:01.040]  to do the direct certification of it in the first that we are looking into the different
[16:01.040 --> 16:05.080]  levels of the analysis, see what is involved, what workloads are in there, is there something
[16:05.080 --> 16:10.960]  which could make this fail, is there a risk in there, what potentially could go wrong?
[16:10.960 --> 16:16.960]  And this is basically the completion of the use cases and I've drawn this basically together
[16:16.960 --> 16:21.240]  as you can see an inner part which is very common for almost all the different projects
[16:21.240 --> 16:27.680]  which get fed by the use cases, feeding and say this is how you need to configure, how
[16:27.680 --> 16:33.000]  you need to specialize because you cannot create a full safety critical item completely
[16:33.000 --> 16:36.960]  out of a context, you cannot have this generic safety argumentation, you always need to judge
[16:36.960 --> 16:44.120]  it towards assumed context and this then turns into these are deliverables.
[16:44.120 --> 16:49.840]  A little bit on another view here, you can see also an exemplary system architecture
[16:49.840 --> 16:55.680]  mainly how we triggered it in the systems work group, it's not only Linux involved in
[16:55.680 --> 17:01.080]  these latest products, so if you come and you, of course in the medical devices open
[17:01.080 --> 17:07.000]  APS system, it's a pure aspirin on it, there is not the direct artist involved if you don't
[17:07.000 --> 17:13.400]  treat the sensor or the insulin pump as the artist next to it, but if you come to more
[17:13.400 --> 17:17.480]  complex products, you always need to face that there are artists involved, there are
[17:17.480 --> 17:22.280]  microcontroller, micro processes, container technology come into picture, everybody talks
[17:22.280 --> 17:26.840]  about containers and embedded these days, and also virtualization technologies that
[17:26.840 --> 17:34.640]  be Xen or that be KVM, so this is something which gets in there easily, and for this part
[17:34.640 --> 17:41.440]  if you see on the working group side this Linux features architecture code improvement,
[17:41.440 --> 17:48.160]  this directly go into the Linux work, so the main outcome of this is for the Linux ecosystem,
[17:48.160 --> 17:53.240]  the Linux kernel, and a lot of this work is also not directly related to the hypervisor
[17:53.240 --> 17:59.800]  or the artist, but there are things also which going a little bit further, like the tools
[17:59.800 --> 18:04.200]  and the engineering process, things which are coming out there may also have a good
[18:04.200 --> 18:09.400]  value for other projects which you build on, so if you have a Yocto involved in there,
[18:09.400 --> 18:16.080]  you can build Xen and Zafaya also with a meta layer, and then it may be good to have this
[18:16.080 --> 18:22.240]  tooling part in there, or also code improvements can come into picture there, certain tools
[18:22.240 --> 18:27.400]  which we make use of in UCI for testing, OpenQA system or others, this is an element
[18:27.400 --> 18:32.000]  to be considered here, and lastly the use cases for the completeness, they basically
[18:32.000 --> 18:37.000]  tailor down this system to whatever you need, so for example in the automotive work group
[18:37.000 --> 18:42.160]  we for now tailor the system down for getting a better Linux kernel understanding, and we
[18:42.160 --> 18:46.560]  get rid of the endear originally from the container, the virtualization the others,
[18:46.560 --> 18:52.640]  but we know once we have solved some parts of our work, we need to get the system context
[18:52.640 --> 18:59.880]  and the system context involve all these kind of things, right, and saying this, we also
[18:59.880 --> 19:06.040]  do a certain outreach to other projects, so I put in the Zafaya community, we have the
[19:06.040 --> 19:11.400]  automotive group Linux which is already in there, there could be other Linux versions,
[19:11.400 --> 19:16.440]  and also strong involvement of the Yocto project, instead I didn't know where to put the SPDX
[19:16.440 --> 19:20.240]  so probably on this picture where we see it later on.
[19:20.240 --> 19:26.040]  How we interact so far, we already are in discussions with Zafaya and Xan, we have weekly
[19:26.040 --> 19:32.800]  meetings also where Xan members pop up, where Zafaya is present with some representative,
[19:32.800 --> 19:38.160]  and we saw that these are safety critical open source projects, so they basically share
[19:38.160 --> 19:42.640]  the same burden, they need to show how the development process is done, how do we guarantee
[19:42.640 --> 19:46.720]  certain quality levels, so where is the testing done, where are the requirements, management
[19:46.720 --> 19:53.280]  and the traceability to everything, so this is something which pops in there quite good.
[19:53.280 --> 19:58.320]  If we take this architecture and as I'm coming from the automotive part, we have different
[19:58.320 --> 20:02.960]  projects which share these architectural sorts, and there is a large group on the Eclipse
[20:02.960 --> 20:07.800]  STV project, there is a SOFI initiative from ARM, basically having similar members like
[20:07.800 --> 20:14.120]  the STV and then we have a large automotive grade Linux, which also is so nice to provide
[20:14.120 --> 20:18.640]  us with the reference implementation for the automotive use case, so they share very similar
[20:18.640 --> 20:19.640]  architectures.
[20:19.640 --> 20:26.240]  Lastly, not directly related to safety, but having safety considerations in there and
[20:26.240 --> 20:31.200]  being part of the system is the Yocto project for some build tooling part, to get this into
[20:31.200 --> 20:37.160]  a CI reproducible, here for example the air bomb generation suddenly plays into the game,
[20:37.160 --> 20:41.560]  which you can do with the Yocto project, and while we were discussing we figured out that
[20:41.560 --> 20:48.680]  there is also like data needed into a system as bomb, and for this we reached out to the
[20:48.680 --> 20:55.480]  SPDX and there is actually a SPDX special interest group on FUSA meeting weekly to extend
[20:55.480 --> 21:02.160]  this scope, I guess there is also later on talk where parts of it get presented.
[21:02.160 --> 21:06.560]  Why do we do all this, I like this statement from George Bernard Shaw, he said if I have
[21:06.560 --> 21:10.200]  an apple and you have an apple, if we exchange the apple we have still one apple, but if
[21:10.200 --> 21:14.960]  I have an idea and you have an idea and we exchange these ideas and we have two ideas
[21:14.960 --> 21:18.360]  and that's basically where it goes about, we need to get a good understanding, we need
[21:18.360 --> 21:25.840]  to bring the things together, and by this we of course need to look into certain activities,
[21:25.840 --> 21:32.080]  so now we come into the part what the different work groups do, and if we check for example
[21:32.080 --> 21:39.160]  the elements, process, tools, documentation, not every work group acts in the same amount
[21:39.160 --> 21:45.360]  as the others do, so just put some bubbles in here to see where are mainly our work
[21:45.360 --> 21:48.880]  is going, so we have a lot of things of course on the software part, the people are interested
[21:48.880 --> 21:54.280]  in the Linux kernel, and the process part is maybe not so strong because it needs to
[21:54.280 --> 21:59.920]  be centralized and the usage of this process goes into the other work groups, so the OSAP,
[21:59.920 --> 22:04.000]  the medical part, architecture a little bit also, they work on these kind of processes
[22:04.000 --> 22:09.400]  and bring this into the other work groups, tools seem to pop out on multiple work groups
[22:09.400 --> 22:15.600]  because tools are handy, tools pop up, we bring it into the, into repo, you tell about
[22:15.600 --> 22:20.000]  it, get it used, and if we want to go into continuous certification at some point of
[22:20.000 --> 22:24.960]  time there will be a need of having a lot of tool support in there, and basically every
[22:24.960 --> 22:30.400]  work group does documentation, I want to give you some examples on this from the process
[22:30.400 --> 22:35.480]  perspective, there is a system theoretic process analysis, that the first topic I will tell
[22:35.480 --> 22:39.960]  a little bit more about, so it's the dry stuff about the systems architecture, it's not
[22:39.960 --> 22:46.160]  the code level on this, but we figured out when you do this kind of STPA analysis at
[22:46.160 --> 22:49.960]  some point of time you reach also a level where you need to understand more about the
[22:49.960 --> 22:53.800]  kernel, so I'll tell you something a little bit about the workload tracing which we have
[22:53.800 --> 23:00.080]  done, and also here supporting from the another work group, we have a call tree tool that's
[23:00.080 --> 23:04.720]  self, not in beta, utilizing tools and the proving thing, but writing something also
[23:04.720 --> 23:11.640]  from scratch, and this all then later on fits into the meta Eliza, which is basically the
[23:11.640 --> 23:18.480]  Yocto layer for the automotive use case enhancing the automotive grade Linux demo, we also did
[23:18.480 --> 23:22.760]  something without modification like the code checker implementations to Scala, I will not
[23:22.760 --> 23:28.200]  tell that much about it, but just to give some examples for work on, and all our information
[23:28.200 --> 23:33.240]  is public, so we have quite spread it up, there's a GitHub part, there's some parts
[23:33.240 --> 23:37.520]  on G Drive, we do regular blog posts, and have some white papers published, so it always
[23:37.520 --> 23:43.000]  depends on whom do you want to have as audience or readers, so we share this, there's also
[23:43.000 --> 23:47.360]  YouTube channel, but I don't judge this as documentation, okay.
[23:47.360 --> 23:56.800]  As at first, we're looking to STPA, so STPA stands for System Theoretic Process Analysis,
[23:56.800 --> 24:01.440]  what's interesting to see is if you're coming from safety criticality, maybe automotive,
[24:01.440 --> 24:07.280]  you know, hazard analysis, risk assessment, FMEAs, you may grow with watch, spreadsheets,
[24:07.280 --> 24:12.920]  drawing cases, checking your API interface and all these kind of things, and the nice
[24:12.920 --> 24:16.920]  thing about the STPA is you go a little more in a graphical approach, like on the left
[24:16.920 --> 24:18.560]  part of the picture.
[24:18.560 --> 24:25.240]  Some basics here, it's still relatively new, I say this because the old analysis part
[24:25.240 --> 24:31.000]  come from microcontroller worlds up down to the 60s, 70s, I guess 70s is more or less,
[24:31.000 --> 24:34.600]  so there was a long time where a lot of these analysis techniques came in and they haven't
[24:34.600 --> 24:40.840]  been much improved, but the systems which have been analyzed have increased complexity,
[24:40.840 --> 24:47.120]  and this is something which needs to be considered, and this System Theoretic Process Analysis
[24:47.120 --> 24:50.640]  STPA is able to handle very complex systems.
[24:50.640 --> 24:56.360]  The reason for this is that you can't start from a quite broad view, and maybe you don't
[24:56.360 --> 25:00.200]  know all the elements, so you have something, you just get a name for it, you don't know
[25:00.200 --> 25:04.880]  how it really looks like, and you have another blob where you have more details, so you can
[25:04.880 --> 25:11.160]  connect all these different blocks, and these analysis will still survive even if you know
[25:11.160 --> 25:17.360]  not the old block of some specific part yet, and then you will go in a very iterative approach
[25:17.360 --> 25:21.480]  and just go there step by step, you figure something out, you go to one level down, going
[25:21.480 --> 25:24.920]  deeper into the system, figure out that your assumption didn't hold true, so you do these
[25:24.920 --> 25:30.840]  kind of things for the analysis, and what's also good, if you have certain analysis, it
[25:30.840 --> 25:35.440]  basically looks on an API level, it looks under definitions or so, but this one explicitly
[25:35.440 --> 25:41.840]  goes on the system context, and it includes human interaction, the human operation, and
[25:41.840 --> 25:45.880]  this is also what's not there for other parts.
[25:45.880 --> 25:48.880]  In parallel, you directly get a good, while you do the analysis, you already improve your
[25:48.880 --> 25:53.680]  documentation, you get a good standing of the system, and you can even if you are in
[25:53.680 --> 25:58.920]  a QA department, so you can even integrate it properly with existing systems model-based
[25:58.920 --> 26:00.920]  approaches.
[26:00.920 --> 26:06.160]  The principles of it, to get the very, very high level, it's quite easy.
[26:06.160 --> 26:14.520]  There are four key elements, there's the controller on top, this one sends a control action to
[26:14.520 --> 26:19.480]  a controlled process, and this provides typically feedback.
[26:19.480 --> 26:23.640]  Well, that's not enough, in the end there's also important to know that the controlled
[26:23.640 --> 26:29.920]  process as such may also control something else, so that's how things get more grown
[26:29.920 --> 26:35.560]  up, and the question now in the end is, what could go wrong, what are unsafe control actions?
[26:35.560 --> 26:43.560]  You can use these methodology for maybe understanding how your water pipes flow in a building, or
[26:43.560 --> 26:48.480]  how people walk through certain, so you can always attach it to whatever use case you
[26:48.480 --> 26:52.960]  like, it's always the same approach, but for our case, and the main idea of it was for
[26:52.960 --> 26:58.520]  safety, criticality, for risk assessments, and that's why we say, let's look under unsafe
[26:58.520 --> 26:59.520]  control action.
[26:59.520 --> 27:04.760]  A little bit of warning, and the next slide is in a way that you will not read, it's
[27:04.760 --> 27:10.440]  level one analysis of this open APS use case, and well, yeah, that's how it looks like,
[27:10.440 --> 27:16.200]  in the middle there's the open APS system, you have a view from a top level, so it's
[27:16.200 --> 27:20.680]  a developer view, it's not the full user view here, so you have infrastructure, people
[27:20.680 --> 27:24.960]  have algorithm developer, you release the software, then they come to the human operator
[27:24.960 --> 27:30.560]  who uses the software, installs it further on, this goes in the system, we don't know
[27:30.560 --> 27:34.160]  yet what the system is, this is what I meant with the very first level, you don't care
[27:34.160 --> 27:39.400]  if it's the Linux system or whatever it's underneath, so this is my open APS system,
[27:39.400 --> 27:43.600]  and when you have understood what is your critical part in there, how the system context
[27:43.600 --> 27:50.520]  looks like, you may go into the next level, and now we zoom in into this open APS system,
[27:50.520 --> 27:55.200]  and go on the next level, and in this you see there is actually a Raspberry Pi involved,
[27:55.200 --> 28:00.200]  we know this from the hardware part, and the OS in there, so it's a Raspbian, you have
[28:00.200 --> 28:06.480]  an open APS toolkit involved, the actual algorithm, this may control the insulin pump, the night
[28:06.480 --> 28:11.280]  scout part is also an external component, you see all these kind of things, and the
[28:11.280 --> 28:15.920]  work group has been on this level for some time, and then try to write down the next
[28:15.920 --> 28:23.040]  level going deeper, and then actually needed support, so that's where workload tracing
[28:23.040 --> 28:28.680]  came into picture, we used the mentorship project here and had support, so someone fully
[28:28.680 --> 28:32.800]  concentrating on the activity of workload tracing, that's another little table which
[28:32.800 --> 28:39.120]  you can at least read, therefore the main things to be known as, we use S-trace and
[28:39.120 --> 28:43.640]  C-scope as the main tools for the analysis, there are stressors in there, like Stress
[28:43.640 --> 28:47.760]  and G, Pax test and other parts, this may depend on your workload which you use, what
[28:47.760 --> 28:52.440]  you would challenge with the system, and in this one the information which is coming in
[28:52.440 --> 28:57.360]  there, now our system calls, how often are these system calls coming in the frequency
[28:57.360 --> 29:02.760]  of it, which subsystem do they belong to, that you know okay, where is my critical parts,
[29:02.760 --> 29:06.120]  where is the system call entry point, and by this you can more deep dive into the different
[29:06.120 --> 29:10.960]  system, and this causes a lot of refinement into the upper layers, again because now you
[29:10.960 --> 29:15.480]  have iteration and see maybe you have a wrong assumption, but still before everything was
[29:15.480 --> 29:20.880]  correct as you understood, no you just improve it, related to this calls of the call tree
[29:20.880 --> 29:27.640]  tool, that's something basically rewritten and own part, so the idea was to see here
[29:27.640 --> 29:33.720]  is a system call, what else, of course what are the ways, how to interact there, how to
[29:33.720 --> 29:37.320]  visualize things, because if you just see something and go through the code you cannot
[29:37.320 --> 29:42.960]  really grab the complexity, and this was just the first shot, so also here it's not worse
[29:42.960 --> 29:48.000]  to read, but you can see there's a file system part, and the very interesting part is, this
[29:48.000 --> 29:53.400]  is quite a static thing, so you will see all the potential options, while in the previous
[29:53.400 --> 29:58.040]  view if you have a call, if you have the workload tracing, you basically see where has the pass
[29:58.040 --> 30:03.240]  gone, but you don't directly uncover the untraced passes, and here you see all the passes,
[30:03.240 --> 30:06.720]  but you have the chance that you meet something completely irrelevant, because you're not
[30:06.720 --> 30:12.960]  on this with your workload, and this is a complimenting element of this, and well you
[30:12.960 --> 30:17.040]  get a good insights on the kernel construction, and it can help you to analyze more workload
[30:17.040 --> 30:27.600]  in there, right, we bring all these things together in the meta-eliter instrument cluster,
[30:27.600 --> 30:32.920]  it looks like the AGL instrument cluster, we saw this picture before, I highlighted the
[30:32.920 --> 30:38.200]  change which we did, we write danger in there, and this made us the whole thing safe, which
[30:38.200 --> 30:44.080]  well is of course not the full story, the full story is that we just needed a use case
[30:44.080 --> 30:49.720]  to which we can analyze, which has safety relevance, and it was a good QT-based demo,
[30:49.720 --> 30:55.480]  so we could make use of it, it was running on QAML, QAML has a little drawbacks on this,
[30:55.480 --> 31:01.800]  I'll come to this very soon, but with this you can start analysis tracing workloads,
[31:01.800 --> 31:10.880]  and also add a watchdog mechanism, watchdog would be the next part of it, basically what
[31:10.880 --> 31:15.960]  we use in a lot of concept is an external watchdog, even if you don't see it directly
[31:15.960 --> 31:20.640]  in the open APS system for example, there's still an external monitoring involved which
[31:20.640 --> 31:25.920]  gives emergency data, if the Raspberry Pi would do something wrong in the wrong or the
[31:25.920 --> 31:30.160]  other direction, not that it happens, but there is a monitor there which controls, which
[31:30.160 --> 31:36.680]  will give a beep or so and inform the user, similar you do it in the automotive case where
[31:36.680 --> 31:42.320]  you have this telltale environment and you want to have something which is traced in
[31:42.320 --> 31:50.280]  your workload, so yeah, this challenge response watchdog, challenge response basically it's
[31:50.280 --> 31:56.040]  not simply looking for something but it gives a little challenge to the workload while the
[31:56.040 --> 32:01.200]  workload process other parts and it gets a response in there so that you know, okay,
[32:01.200 --> 32:06.920]  yeah, that's really alive and it's not just replying and the demand here comes basically
[32:06.920 --> 32:13.240]  that we, for a lot of use cases, cannot fully guarantee that the workload comes in the proper
[32:13.240 --> 32:17.920]  time that the process doesn't hang and this release a lot of responsibility from you by
[32:17.920 --> 32:22.280]  checking this with an external workload, so it's mainly looking into the safety critical
[32:22.280 --> 32:27.000]  workload, I know there are ideas to say well let's put this watchdog thing and let's watch
[32:27.000 --> 32:31.720]  everything in there, this typically doesn't work out, so you really concentrate on the
[32:31.720 --> 32:38.880]  things and say this is safety critical and all the other parts are related to user experience,
[32:38.880 --> 32:44.080]  so if you're drawing rendering engine, God's lucky and you see a lot of delay and touch
[32:44.080 --> 32:48.840]  screen or whatever, that's nothing which you want to experience from a user perspective
[32:48.840 --> 32:53.560]  but as long as the warning signs come in time and in proper from a safety perspective, this
[32:53.560 --> 32:58.960]  is all fine, so it's good to split up here between what is the intended functionality,
[32:58.960 --> 33:04.160]  what is the safety criticality of it, what do I need to monitor and what not and for
[33:04.160 --> 33:08.920]  this, this is just the safety net in there, here I said this is used widely in automotive,
[33:08.920 --> 33:12.800]  there are other industries basically always have your safety net somewhere around which
[33:12.800 --> 33:19.000]  monitor things and what we try to do is we want to get more responsibility to Linux
[33:19.000 --> 33:27.920]  and by this you can start with a lot of elements in this safety critical part and yeah, so
[33:27.920 --> 33:34.680]  that's the main thing on this part and the last message is very important for me, it's
[33:34.680 --> 33:41.880]  not that you consider your watchdog in this design as being there or need to be there,
[33:41.880 --> 33:46.760]  you basically start creating your system that you never need to trigger the watchdog because
[33:46.760 --> 33:51.640]  you don't want this, this is just your system functionality and it has to work and in best
[33:51.640 --> 33:57.240]  case this gets not triggered into a safe state, for TELTA use case for example this could
[33:57.240 --> 34:04.080]  mean that the screen is turned off or that you do a restart, basically you would maybe
[34:04.080 --> 34:08.800]  make a black screen or so that people directly recognize the driver, oh it's not going right
[34:08.800 --> 34:13.360]  here, it could be also be the warning message or what else but depending on what's your
[34:13.360 --> 34:17.400]  safety process you need to make sure that this is really also triggered so their safety
[34:17.400 --> 34:20.360]  criticality comes in picture again.
[34:20.360 --> 34:26.560]  I prepared a one minute video but I never know how these kind of things properly work
[34:26.560 --> 34:33.360]  if you do a demonstration so I just put the YouTube link on the material and if you are
[34:33.360 --> 34:40.120]  brave enough or even not, I guess it's a straightforward thing, we have a good documentation
[34:40.120 --> 34:47.840]  how to experience this demo because when we started with the ELISA work we saw that we
[34:47.840 --> 34:53.160]  basically start building our topics from scratch, we documented everything right good as best
[34:53.160 --> 34:58.000]  understanding and then someone came and said well but I'm not using Ubuntu, I'm using an
[34:58.000 --> 35:03.600]  open SUSE tumbleweed and we figured we need a little bit more maybe that we have more
[35:03.600 --> 35:08.200]  environments set up that people can reproduce things so we came up with a docker container
[35:08.200 --> 35:12.760]  which basically gets the things packages installed which you need the right version of it to
[35:12.760 --> 35:18.080]  make it easier for people then the next thing we observed was oh okay the people do a yachtable
[35:18.080 --> 35:23.400]  it consumes a lot of space and a lot of compilation, maybe the cache binaries would be a good
[35:23.400 --> 35:30.560]  option and so we also enabled the estate in there so that you cannot build like in the
[35:30.560 --> 35:36.040]  parts which are still buildable or needed to be built in roughly 40 minutes on a poor
[35:36.040 --> 35:40.240]  laptop, it basically depends on your download speed also right it's quite a amount of download
[35:40.240 --> 35:44.680]  which you typically have with the yachtable, on the long one we also see if we can extend
[35:44.680 --> 35:49.160]  it to other systems and maybe also Debian version of it or so but for now it's the yachtable
[35:49.160 --> 35:53.280]  the last thing which we figured out there are also use cases maybe where you want to
[35:53.280 --> 35:57.480]  deep dive into the system and this would be the complimenting part to this demo if you
[35:57.480 --> 36:01.960]  don't want to see the video and you want to just try it out directly if you have QM
[36:01.960 --> 36:08.400]  on your system installed just download the binaries directly they get built nightly
[36:08.400 --> 36:13.000]  so really nightly so every night you get a new one it always goes to the latest version
[36:13.000 --> 36:18.160]  of the AGL with a little bit of problems last week but it's up and running again does a
[36:18.160 --> 36:21.800]  boot check does a boot check so that you can really experience it and it basically uses
[36:21.800 --> 36:28.160]  the instructions which are written down in the github readmemarkdown file right yeah
[36:28.160 --> 36:35.760]  this is about this some next steps the STPA is continued so we're getting into deeper
[36:35.760 --> 36:41.680]  levels of it we need to see that we get the workload tracing properly reflected in the
[36:41.680 --> 36:46.200]  different diagrams this was heavily driven by the medical devices where the automotive
[36:46.200 --> 36:51.520]  has not used the workload tracing that much but we bring this in there the call tree also
[36:51.520 --> 36:58.920]  got extended with another tool which was KS called KS enough does certain kernel static
[36:58.920 --> 37:04.480]  navigation tool so to get a better analysis on better view on this there for the meter
[37:04.480 --> 37:09.200]  Eliza as I was talking about QM where everybody wants to see real hardware so we also are
[37:09.200 --> 37:14.720]  in the past on bringing this on an ARM based hardware for now so we have the 86 and QM
[37:14.720 --> 37:20.000]  simulation and an ARM underneath is mainly driven by systems workgroup and what is very
[37:20.000 --> 37:24.880]  important so far this display checking in there so we are not normally would check what
[37:24.880 --> 37:28.240]  the rendering of a telltale but there's so many different kind of implementation so
[37:28.240 --> 37:33.000]  that we mock a lot of things there and we want to improve this so that we have proper
[37:33.000 --> 37:38.600]  display checks and also a lot of monitoring this is basically on the four topics which
[37:38.600 --> 37:47.400]  we have seen additionally we work on a system as bomb we enabled the as bomb part for generating
[37:47.400 --> 37:53.640]  material in the demo we want to improve kernel configuration trimmed on the size of the image
[37:53.640 --> 38:00.200]  then have the RT documentation updated have more complex cluster than we're involved and
[38:00.200 --> 38:07.120]  that's may need so summarizing what you have seen we talked about the challenges in the
[38:07.120 --> 38:12.000]  beginning basically what the difference between the traditional safety critical artist and
[38:12.000 --> 38:16.800]  the new one what this is what the collaboration can and what cannot achieve you heard about
[38:16.800 --> 38:22.360]  the goals and the way of the strategy which tools we analyze or which which elements we
[38:22.360 --> 38:28.200]  looked into and also then you could see how the different workgroups interacted how they
[38:28.200 --> 38:34.320]  put into a system how we all reach to wider community parts I talk about the contributions
[38:34.320 --> 38:40.480]  of the different workgroups which shared with the community also in form of usable use case
[38:40.480 --> 38:49.000]  downloadable then you could see methodologies of our STPA workload tracing and lastly we
[38:49.000 --> 38:55.600]  got a little review on what's coming next and I guess we're good from the time from
[38:55.600 --> 39:13.440]  the questioning part it does anyone have a question says one above coming down you have
[39:13.440 --> 39:25.600]  a question okay thanks for the interesting talk you mentioned certification as one big
[39:25.600 --> 39:35.040]  problem so where can we improve things so that certification processes become more open
[39:35.040 --> 39:40.480]  source friendly and open source software becomes more certification friendly so what has to
[39:40.480 --> 39:47.760]  be done or can be done there yeah I guess some part from the city asking how can open source
[39:47.760 --> 39:52.880]  and certification come closer to each other from both sides right and one thing could
[39:52.880 --> 39:59.400]  be for example done in the documentation and improving tracing down having tools supporting
[39:59.400 --> 40:04.280]  how do certain features get from the mailing list into the system if there's a test around
[40:04.280 --> 40:10.440]  it so this gives a lot of confidence and trust in what it's doing from another perspective
[40:10.440 --> 40:15.720]  there's not much in the safety integrity standards which allow the usage of pre-existent software
[40:15.720 --> 40:20.880]  elements so for this there's also an isoparse currently which allows more usage I mean depends
[40:20.880 --> 40:25.240]  on the safety standard which you're in if you're some relaxed medical standards less
[40:25.240 --> 40:31.160]  requirements on this but for automotive it's very strong and prohibitive on this so I would
[40:31.160 --> 40:36.560]  say doing careful work and explaining design decisions and so on making this visible and
[40:36.560 --> 40:40.920]  more structured having maybe centralized bug tracking and so on this this can help a lot
[40:40.920 --> 40:45.800]  from this perspective it will be good for the certification authorities and we do a lot
[40:45.800 --> 40:56.720]  of clearance also yeah I guess if I heard you correctly said from supporting the assessments
[40:56.720 --> 41:03.280]  and the sororities in there we also have company support where we really are in the working
[41:03.280 --> 41:06.960]  groups and get from certification authorities input in the continuous work which we are
[41:06.960 --> 41:16.360]  doing so they are directly working within the work groups as well yeah chin as well thank
[41:16.360 --> 41:20.600]  you very much for your talk I had a just a quick question I want to get a feel for what
[41:20.600 --> 41:28.080]  your opinion on on this is do you think there's space as a certification for for something
[41:28.080 --> 41:32.320]  like Linux improves can you move the mic a little closer because it's for me I hear the
[41:32.320 --> 41:41.400]  people louder leaving so just a little sorry yeah oh oh wow yes the difference as as as
[41:41.400 --> 41:47.760]  process for certification and for validation of Linux kind of improve and and change over
[41:47.760 --> 41:52.320]  time do you think there's ever going to be space for for Linux to be used in in kind
[41:52.320 --> 41:57.480]  of a critical component on vehicles or do you think that space is completely reserved
[41:57.480 --> 42:04.640]  for for something that's actually using real time the main part which I heard was if there's
[42:04.640 --> 42:16.080]  I got the real time part in the end yeah like do you think there is it's already there fair
[42:16.080 --> 42:27.960]  thank you okay it wasn't there does anyone else have a question you have a question yeah
[42:27.960 --> 42:34.760]  so what is the place for Linux itself in let's say what's the safety and integrates
[42:34.760 --> 42:42.560]  a level of Linux itself in this model because if we take let's say ISO 26262 there's a v-model
[42:42.560 --> 42:49.680]  requirements for development this but Linux already has source code there are no you know
[42:49.680 --> 42:55.880]  there is no coverage this test with all these mcdc coverage etc etc so what's the place of
[42:55.880 --> 43:03.720]  Linux and how to keep it maintain it without forking yeah so you say where's the space
[43:03.720 --> 43:08.520]  in the place of Linux if you see the v-model for example the ISO 26262 where does things
[43:08.520 --> 43:16.640]  fit in there a lot of demands like car car coverage parts tracing and so on so what you
[43:16.640 --> 43:22.440]  can see is that first of all speaking about a level you will not directly go to an asl
[43:22.440 --> 43:26.560]  d level which puts much more requirement on the tools that's for sure so you should start
[43:26.560 --> 43:32.240]  on the lower asl a b level that's also what we did we relaxed some parts also for automotive
[43:32.240 --> 43:37.240]  cases let's don't start with too complex parts maybe get a real time criticality out there
[43:37.240 --> 43:45.680]  because then you have to review much more parts and so the space which I see is that
[43:45.680 --> 43:51.280]  you should argue equivalence for certain things that you are in close collaboration with assessors
[43:51.280 --> 43:57.480]  and explain how things are done because when the ISO was originally prepared it was not
[43:57.480 --> 44:02.400]  considering a complex system as Linux being in use and the large amount of pre-existing
[44:02.400 --> 44:07.680]  software so from this if you are in an assessment if you are there if you can show and show
[44:07.680 --> 44:13.640]  the credibility by requirements work by good concepts you may in the first and come up
[44:13.640 --> 44:20.360]  which to a system which is arguable to be safe but not directly certifiable to your
[44:20.360 --> 44:26.760]  ISO 262 part but this already showed you the perfect discussion room also right because
[44:26.760 --> 44:32.800]  then you see well you cannot tell me this is not working but you still say it's not
[44:32.800 --> 44:37.880]  certifiable and then you see also the glitch of the standard and if you reach this point
[44:37.880 --> 44:41.800]  you have a lot of good support when you go with certification authorities early if you
[44:41.800 --> 44:46.160]  have internal assessments and you can judge it and in the end it's also your responsibility
[44:46.160 --> 44:51.360]  where you say oh I argue for an equivalence because it's not saying in this spec you have
[44:51.360 --> 44:56.160]  to assess recommended highly recommended leaving you also trace for showing equivalence
[44:56.160 --> 45:00.880]  to this model I'm using this and on top I'm adding this and by this you can get an argument
[45:00.880 --> 45:04.720]  and of course getting feedback from your developers that the work which you're doing also into
[45:04.720 --> 45:08.920]  kernel mainline and so on.
[45:08.920 --> 45:17.960]  So maybe also it's possible to somehow affect how ISO 262 is developed because it's a bit
[45:17.960 --> 45:22.480]  outdated in some way.
[45:22.480 --> 45:27.360]  Some of the members in ELISA have people in these ISO committees that are basically taking
[45:27.360 --> 45:31.360]  it back into that direction for the future revs of the standards.
[45:31.360 --> 45:34.240]  We don't have visibility at least I don't because I'm not in those committees but we
[45:34.240 --> 45:40.160]  do know that some of those member companies you saw up there are there and they are advocating
[45:40.160 --> 45:49.320]  for things to work a little bit better in future revs.
[45:49.320 --> 45:54.600]  Is there anyone else who has a question?
[45:54.600 --> 46:19.920]  Okay thank you for your talk.
[46:19.920 --> 46:27.760]  Thank you very much.