[00:00.000 --> 00:13.520] I want to go very quickly because I'm rushing through because I don't want to keep you from [00:13.520 --> 00:14.760] the very juiced part. [00:14.760 --> 00:23.760] So we are presenting a complete, as complete can be, compliance toolchain for yacht projects. [00:23.760 --> 00:29.960] So this in the context has been produced for a project we are part of called Eclipse Oniro. [00:29.960 --> 00:39.440] It's an Eclipse Foundation project, it's yacht-based, and it provides an all-scenario platform project. [00:39.440 --> 00:44.640] It's very complex, the numbers are quite impressive, cannot run through, but they are quite learning, [00:44.640 --> 00:50.040] and everything is based on CI, so everything is CI. [00:50.040 --> 00:56.760] We also needed to have a completely CI-ed also compliance process, so we needed to build [00:56.760 --> 01:04.120] with the existing parts and building parts that did not exist, by providing a dedicated [01:04.120 --> 01:05.620] toolchain. [01:05.620 --> 01:11.680] This toolchain for compliance has been made not just by us, but by the Neuropath Tech Park. [01:11.680 --> 01:19.000] People are in the room as well, and they have done much of the heavy lifting of programming, [01:19.000 --> 01:22.400] and Albert of course gets the credit for the software. [01:22.400 --> 01:27.960] It's also an Eclipse Foundation project, and it's alright, there is a REPL, and you can [01:27.960 --> 01:29.760] look at everything. [01:29.760 --> 01:38.600] It's based on an already existing, the no-subjects of compliance for soldiers can code, but we [01:38.600 --> 01:47.520] have to create a dedicated new custom set of tools also being distributed as open source. [01:47.520 --> 01:54.040] Of course we use SPDX all across the toolchain for many things, and this is important, this [01:54.040 --> 02:02.560] is not just for this project, this is meant to be a toolchain for all the Octobase project, [02:02.560 --> 02:05.000] not just for this. [02:05.000 --> 02:11.400] Through this combination of tools, we have been able to complete a very lengthy process [02:11.400 --> 02:18.440] of compliance, reaching 100% of coverage of all the components. [02:18.440 --> 02:23.920] I don't want to skip, thanks to the Damian providing a lot of the information that we [02:23.920 --> 02:25.800] have really used. [02:25.800 --> 02:33.600] Here you can see very quickly the dashboard, so you can track the evolution and the status [02:33.600 --> 02:35.680] of compliance. [02:35.680 --> 02:41.280] You can also go and have many more details of the single packages. [02:41.280 --> 02:47.960] That is just for knowing what components and what license is there, but we cannot stop [02:47.960 --> 02:48.960] there. [02:48.960 --> 02:57.280] We want to go farther, and we have what we call the second phase of the compliance toolchain, [02:57.280 --> 03:01.200] and we decided to go for a graph database for many technical reasons, because there [03:01.200 --> 03:07.040] are a lot of interactions we have to traverse the database very easily. [03:07.040 --> 03:13.520] We need to do the missing part, which is software composition analysis, dependencies analysis, [03:13.520 --> 03:18.600] and incompatibility checks, and everything must be done automatically as much as possible, [03:18.600 --> 03:24.440] so this must be entirely into the CI pipeline, which it does. [03:24.440 --> 03:28.120] In order to do this, we need several things. [03:28.120 --> 03:34.040] This is really important, map all licenses data from the source to the binary, and at [03:34.040 --> 03:40.120] file-level mapping, by file-level mapping, we mean every single file, including patches, [03:40.120 --> 03:44.920] everything that goes through needs to be checked, checked some, and tracked through all, in [03:44.920 --> 03:50.200] order to make sure that everything that goes inside is tracked. [03:50.200 --> 03:52.480] This information comes from many sources. [03:52.480 --> 04:01.080] We had an audit team who has done a lot of work, and finally, we need to find a way to [04:01.080 --> 04:08.600] automatically, as much as possible, see if the inbound and outbound licenses are compatible [04:08.600 --> 04:11.040] with each other through automated tools. [04:11.040 --> 04:18.040] This is in the very short time, the general description, here an example of the relationship [04:18.040 --> 04:26.280] between components in the database, and there is the next bit, which is perhaps more interesting, [04:26.280 --> 04:32.760] and I give the floor to Alberto. [04:32.760 --> 04:41.080] So thank you, Carlo, and so the question is, why do we need this? [04:41.080 --> 04:48.360] So Yacht of Warflow, for those who are already familiar with that, is quite simplified here. [04:48.360 --> 04:55.000] We have recipes, recipes that can map to a single upstream source component, but possibly [04:55.000 --> 05:00.640] also two different upstream components, maybe the main application, then some plug-ins. [05:00.640 --> 05:07.560] So you can have here many different upstream sources combined together, fetched, and together [05:07.560 --> 05:16.000] combined into a single work here, in the unpacker stage task of the build process. [05:16.000 --> 05:21.360] Then there are other tasks, configure, patch, whatever, basically build. [05:21.360 --> 05:27.480] We got binaries, and binaries are combined together to form the final image. [05:27.480 --> 05:35.880] But the problem is that when it comes to upstream sources, we have upstream components that [05:35.880 --> 05:39.960] have multiple licenses inside it. [05:39.960 --> 05:46.960] Maybe we have different components with different sets of licenses, but the thing is, we have [05:46.960 --> 05:52.800] only a small subset of the binaries that we could generate for all this stuff, and the [05:52.800 --> 06:00.520] binaries, you don't know what the actual license is, especially when here you have kind of [06:00.520 --> 06:01.920] a mess. [06:01.920 --> 06:08.320] And the thing is that you risk that some dirty stuff ends up in your image. [06:08.320 --> 06:18.480] I mean, I don't know, if you have a package with a blacklisted license or compatible combination [06:18.480 --> 06:24.760] of licenses, you may have in your final image something that is not compliant, and this [06:24.760 --> 06:27.320] is something we need to avoid. [06:27.320 --> 06:32.600] So the thing that we have to do is to follow a process. [06:32.600 --> 06:39.680] So to find out the relationship with third-party upstream code, the inbound code, then we need [06:39.680 --> 06:46.280] to find under which license the inbound upstream software is, therefore the inbound license. [06:46.280 --> 06:52.400] And if there is a possible combination of that, because not all combinations are allowed, [06:52.400 --> 06:58.560] and depending on the context, they may be allowed or not, like other talks pointed out [06:58.560 --> 06:59.960] this morning. [06:59.960 --> 07:06.600] And we need to match this combination with the abound license, is the wanted abound license [07:06.600 --> 07:09.580] compatible with the inbound licenses. [07:09.580 --> 07:14.680] And this for each artifact, we cannot do that, especially in the embedded field at package [07:14.680 --> 07:18.680] level because the package may contain a lot of stuff. [07:18.680 --> 07:23.680] And we need to know about the files, not the generic about the package. [07:23.680 --> 07:27.440] Here we have an example, I hope it's readable. [07:27.440 --> 07:34.240] This is GPGME component that GPG made easy. [07:34.240 --> 07:37.200] It's a very small component. [07:37.200 --> 07:42.200] In our project, we found it out that we generate only three binaries out of it. [07:42.200 --> 07:50.840] If you look at the license and the recipe, you find the GPL, I guess, three or LGPL two [07:50.840 --> 07:52.240] or something like that. [07:52.240 --> 07:56.920] But this is nothing, sorry, not or, but and. [07:56.920 --> 08:04.560] So the thing is that this binary is that GPL, is that LGPL, is that something else? [08:04.560 --> 08:09.920] We don't know from what we have from source license information. [08:09.920 --> 08:15.280] So we collect in this graph database multiple sources of information. [08:15.280 --> 08:22.280] So on the yellow dots are the information coming from our auditing, working on physiology. [08:22.280 --> 08:34.960] So file level, source, license information, the purple dots are, comes from Yachta metadata [08:34.960 --> 08:42.240] and also the information about which files are being generated by, to generate, they [08:42.240 --> 08:48.840] have been used to generate this binary, comes from Yachta from the create SPDX class presented [08:48.840 --> 08:49.840] before. [08:49.840 --> 08:57.920] Basically, we now have MIT, MIT, GPL two, GPL three, sorry, GPL two or later, or LGPL [08:57.920 --> 09:01.880] three or later, and then we have GPL three or later. [09:01.880 --> 09:03.600] What's the license of this file? [09:03.600 --> 09:12.240] So usually the audit team comes to us and we discuss to find out which is the license [09:12.240 --> 09:15.560] of the binary file. [09:15.560 --> 09:17.200] But this is not scalable. [09:17.200 --> 09:18.360] This is error prone. [09:18.360 --> 09:23.080] We cannot do that for every single binary file, for every release, for every snapshot [09:23.080 --> 09:25.680] of the project. [09:25.680 --> 09:27.280] This is another example. [09:27.280 --> 09:30.520] This is another binary you can generate from the same component. [09:30.520 --> 09:34.960] Here you have MIT, MIT, MIT, LGPL two, point one or later. [09:34.960 --> 09:36.600] This is another story. [09:36.600 --> 09:39.640] Maybe it's easier here. [09:39.640 --> 09:44.840] And we can have also more complicated, I won't go into the details here. [09:44.840 --> 09:48.240] So how can we handle that in an automated way? [09:48.240 --> 09:52.640] So the idea is to have kind of a battle between license cards. [09:52.640 --> 10:00.480] So you put together every time two, two cards, and you need a set of rules to decide which [10:00.480 --> 10:01.880] card wins. [10:01.880 --> 10:09.080] And then you iterate and you look over all the possible combinations of license until [10:09.080 --> 10:12.560] only one will survive, hopefully. [10:12.560 --> 10:20.720] So to do that, we are trying to define a language to define those rules. [10:20.720 --> 10:26.120] We need to define the two license cards, the two license cards that are fighting against [10:26.120 --> 10:27.120] each other. [10:27.120 --> 10:31.400] We need to fight the called battlefield kind of context. [10:31.400 --> 10:35.840] Is that static linking, dynamic linking, whatever? [10:35.840 --> 10:42.520] The authority who said that, because the answer by the lawyers always, it depends. [10:42.520 --> 10:46.080] So we need to say who said that. [10:46.080 --> 10:54.800] In this case, FSF, we kind of trust it, especially when it comes to GPL compatibility matrix. [10:54.800 --> 10:58.240] And here we are the result of the battle. [10:58.240 --> 11:03.080] We have a winner or we have kind of invalid. [11:03.080 --> 11:10.400] So this kind of combination is not possible, because GPL two only is not inbound compatible [11:10.400 --> 11:18.800] with Apache 2.0, while GPL two or later may be inbound compatible with Apache 2.0. [11:18.800 --> 11:26.800] If you make it become GPL three, because GPL three is allowed to be inbound compatible [11:26.800 --> 11:28.240] with Apache 2.0. [11:28.240 --> 11:31.320] So this is kind of an example. [11:31.320 --> 11:34.120] I don't have time to go into details. [11:34.120 --> 11:42.440] The rules in action are like here, when you have disjunction, a disjunctive license expression, [11:42.440 --> 11:45.320] you need to calculate the Cartesian product. [11:45.320 --> 11:50.800] So you need to have, in this case, MIT, GPL two and GPL three fighting against each other. [11:50.800 --> 11:57.480] Then MIT, LGPL three and GPL three fight against each other. [11:57.480 --> 12:03.360] You find a list of the decision and we know that GPL three or later is the license that [12:03.360 --> 12:07.680] prevails here at the end. [12:07.680 --> 12:12.000] So the time is up, sorry. [12:12.000 --> 12:18.440] So the thing is that, how I already said, we consume data from different sources, from [12:18.440 --> 12:23.960] physiology, from Yachto, and for now we have proof of concept. [12:23.960 --> 12:30.960] The aim is to upstream everything into the Creator Speedyx class in the Yachto project. [12:30.960 --> 12:35.840] I don't think we have time for questions, but here and in the slide you find all the [12:35.840 --> 12:38.280] information to contact us. [12:38.280 --> 13:03.600] So thank you.