[00:00.000 --> 00:15.720] Hi, everyone. Yeah, my name is Miguel Sena, so I work for Microsoft and I'm mostly the [00:15.720 --> 00:23.880] main of Landlack, which is a new Schedule Linux feature. And yeah, it's about sunbathing. [00:23.880 --> 00:32.360] So this talk is about Rose Library. We wrote for Landlack and, well, we kind of had some [00:32.360 --> 00:41.840] changes about compatibility. So yeah, just quick introduction and context to understand [00:41.840 --> 00:51.440] the programmatic here. So yeah, why care about security? So here, well, it might be [00:51.440 --> 00:58.320] abuse for some, but like every application can be compromised. Every application can be [00:58.320 --> 01:05.480] trusted at first and during the lifetime of a process, it can, well, become malicious. [01:05.480 --> 01:11.720] So yeah, as developers, there's, well, multiple problems. So we don't want to participate to [01:11.720 --> 01:19.520] malicious actions performed by attackers through our software. And we kind of have a responsibility [01:19.520 --> 01:27.480] for users, especially to protect their personal data. And yeah, there's also the, well, there [01:27.480 --> 01:34.400] might be some issues about third-party code. So security is unboxing is a security approach [01:34.400 --> 01:42.000] to isolate software and mainly to isolate them by dropping ambient access rights. So [01:42.000 --> 01:48.720] in a shell, well, when you launch an application in, like, common in Existro, this application [01:48.720 --> 01:54.240] can access a lot of files, including some, which are kind of private, like.ssh, for [01:54.240 --> 02:00.640] example. So some mixing should not be confused with namespaces and containers, which is a [02:00.640 --> 02:07.760] way to create kind of a virtualized environment. And Seccom is also something which is really [02:07.760 --> 02:12.800] interesting for security purposes, but it's not about access control. It's about protecting [02:12.800 --> 02:20.400] the kernel. That was initially the, well, initial goal of Seccom. So Linux is really [02:20.400 --> 02:28.080] dedicated from the ground to bring some working features to Linux. So to bring some security [02:28.080 --> 02:34.880] features to the kernel. So it is an access control system available to every processes. [02:34.880 --> 02:43.640] You don't need to be a root or whatever. And it is designed to be embedded in applications. [02:43.640 --> 02:51.160] So to create built-in sandboxing. It's the way to create one or even multiple layers [02:51.160 --> 03:00.080] of new securities. So it comes kind of after all system-wide access control, which are [03:00.080 --> 03:07.200] already in place. And so it's available on most distros nowadays. And if it is not the [03:07.200 --> 03:14.680] case, well, I grant you to open an issue in your favorite distro. So about sandboxing [03:14.680 --> 03:21.600] here, what's the interesting point about sandboxing and built-in application security? [03:21.600 --> 03:28.000] If, well, that we can create tailored security policies and embedded them in the application. [03:28.000 --> 03:37.640] So there's interesting things about that. And that might help to make it security like [03:37.640 --> 03:45.720] invisible, which is kind of the main purpose here. We want to not bother users, but secure [03:45.720 --> 03:53.400] them anyway. So because these securities policy can be embedded in the application, well, [03:53.400 --> 03:59.560] it can use the application semantic. It can also use the application configuration transparently. [03:59.560 --> 04:07.120] So you don't need to add another configuration stuff. It's not another layer of execution. [04:07.120 --> 04:12.520] It's embedded in the application. And of course, well, if the configuration depends on user [04:12.520 --> 04:19.920] interaction, well, it can adapt to this change of behavior. And one really interesting point [04:19.920 --> 04:27.720] is, well, as developer, you want to test what you do. And you want to kind of get guarantees [04:27.720 --> 04:34.760] that whatever you're developing is still working. And being able to embed security policies [04:34.760 --> 04:39.760] in your application, make it possible to test them the same way that you can test every [04:39.760 --> 04:45.200] other features. So that's really interesting. You don't rely on, let's say, Selenix being [04:45.200 --> 04:51.120] installed on your test machine and so on. And it adapts to the application over time. [04:51.120 --> 04:55.680] So if you have, well, a CI, which is well configured, you can test it and make sure [04:55.680 --> 05:01.960] that, well, you can a bit, a bit add new features, updates the security policy and make sure that [05:01.960 --> 05:09.520] everything was as expected. So speaking about the library and the Rust library, so the idea [05:09.520 --> 05:17.080] was to create something which is Rusty, so identity to Rust. And for this, well, we wanted [05:17.080 --> 05:24.960] to leverage strong typing so to get some developing guarantees. And so to follow some common patterns. [05:24.960 --> 05:32.840] So many here, the builder pattern. So it's still a work in progress. It's working. But [05:32.840 --> 05:39.480] yeah, we're working on improving the API and make it easier and more, yeah, easy to use [05:39.480 --> 05:46.440] for competitive reasons. So this talk about these kind of compatibility requirements. [05:46.440 --> 05:54.680] And yeah, so I'll talk about that. Some example of early-period users listed here. But yeah, [05:54.680 --> 06:04.040] it's still in kind of beta. So let's start with some code example. So just as a warning, [06:04.040 --> 06:10.720] this kind of simplified code, it's working. But yeah, for the demo, well, it's on demo, [06:10.720 --> 06:19.040] but for this example, the idea is to make it simple to, well, to make it easier to understand. [06:19.040 --> 06:24.800] So you can see at the left, there's a C code. And at the right, the exact same semantic, [06:24.800 --> 06:29.800] but in Rust. So I will mostly talk about the Rust code. But yeah, you can take a look [06:29.800 --> 06:36.160] at the C code to kind of see the difference between them and how Rust can be useful there. [06:36.160 --> 06:42.720] So as I said, it is based on the builder pattern. So you create a rule set object here with [06:42.720 --> 06:49.480] a rule set new. And from there, you kind of call different methods to, well, build the [06:49.480 --> 06:54.160] object here. In this case, a root set. So a root set will contain a set of rules. And [06:54.160 --> 06:58.640] yeah, at first, you define what you want to enforce, what you want to restrict, what [06:58.640 --> 07:04.800] you want to deny by default. So in this case, these are two actions. The action to execute [07:04.800 --> 07:11.080] files and the action to write on files. So obviously, it's not enough. But in this case, [07:11.080 --> 07:17.000] it's easy to understand for the simple use case. And then, once you define the rule set [07:17.000 --> 07:23.280] and what the rule set can handle, well, you can create it. And the rule set creation translates [07:23.280 --> 07:28.560] to, you can see at the left, there's a London trade rule set. And this function is in fact [07:28.560 --> 07:36.680] a C school. So in the Rust part, when you call the create method, it creates a new rule [07:36.680 --> 07:43.840] set, which is backed underneath by a new file descriptor, dedicated to Larnock. And that [07:43.840 --> 07:51.520] is a wrap in the rule set object. Then, if you want to add rules to allow some directory [07:51.520 --> 07:57.680] to be, for example, executable, which is the case here. So, well, you open the slash user [07:57.680 --> 08:06.800] directory and you make it, well, executable. So, allow access, access execute. And then, [08:06.800 --> 08:12.240] you can add other rule you want for all the exception that should be legitimate for the, [08:12.240 --> 08:19.040] well, legitimate use case. And then, you restrict the current process. Well, in fact, the current [08:19.040 --> 08:27.760] thread. And from this point, the current thread can only execute files which are in slash [08:27.760 --> 08:39.000] user. And it cannot write anything at all, actually. So, that was an introduction, quick [08:39.000 --> 08:46.200] introduction to the library. And the thing is, Larnock is not a full feature access control [08:46.200 --> 08:54.560] yet because, well, it is complex. And, well, to reach this goal, well, we need to spend [08:54.560 --> 09:04.160] much more years to increment, well, to add new features to the link scale. Yeah. And [09:04.160 --> 09:12.320] the thing is, well, sometimes you might add new features that enable to restrict more. [09:12.320 --> 09:19.840] And sometimes we might add some features to restrict less. So, let's see what this means. [09:19.840 --> 09:28.120] So, the first version of Larnock, which was released with a 5.13 kernel, basically allowed [09:28.120 --> 09:35.680] to read, write, and do a lot of common stuff to restrict a lot of common files and actions. [09:35.680 --> 09:42.280] But there was, like you can see here, there's three categories. So, first one, always denied, [09:42.280 --> 09:47.400] was for the first version of Larnock, the actions that were always denied whenever you [09:47.400 --> 09:56.880] sandboxed a thread. So, that was for, well, complexity in the development, but also security [09:56.880 --> 10:03.160] reasons. So, for example, you are not able to execute set-ready binaries because it will [10:03.160 --> 10:09.360] be kind of a way to bypass the sandbox. And there was some restriction on Ptrace, so you're [10:09.360 --> 10:15.680] not allowed to debug an application process which is outside the sandbox. Obviously, it [10:15.680 --> 10:21.440] will be a way to get out of the sandbox. So, that's not what we want. [10:21.440 --> 10:30.320] So, the second version of Larnock had its new way, a new access write, which was a way [10:30.320 --> 10:37.280] to repound files. So, at first, it was denied to change the predatory of a file for security [10:37.280 --> 10:44.280] reasons because Larnock is based on five keys identification, and that was kind of complex. [10:44.280 --> 10:52.640] So, but the second version, we implemented that, and then it became configurable. So, [10:52.640 --> 11:00.960] one item less in the always denied box. In the third version of Larnock, so, all these [11:00.960 --> 11:08.800] versions are new kernel releases, and in the third version, we added a new way to restrict [11:08.800 --> 11:18.000] a file propagation. So, propagation in Larnock is to change the size of a file, and this was [11:18.000 --> 11:23.840] always allowed before because it wasn't endowed. It was a bit complex to implement this in [11:23.840 --> 11:30.000] the kernel at this time, but now it is possible. So, you can see that we can move items from [11:30.000 --> 11:35.960] the always denied box to the configurable and from the always allowed box to the configurable [11:35.960 --> 11:43.720] list. So, application compatibility. There's two main things in compatibility. It is forward [11:43.720 --> 11:49.840] compatibility in a way that when you update your kernel, you still can use the old kernel [11:49.840 --> 11:55.760] features. So, that's kind of common. And the backward compatibility in this case is, well, [11:55.760 --> 12:00.520] when you're using a kernel feature, well, you might need the specification of the kernel [12:00.520 --> 12:09.040] that supports this feature. And if your application is running its launch on an old kernel, well, [12:09.040 --> 12:14.280] that feature might be missing. And the thing is, when you're developing an application, [12:14.280 --> 12:19.480] well, you don't know on which kernel your application will run because, well, it's a [12:19.480 --> 12:28.000] user choice and a distro choice. What comes with landlock is the ability to get the landlock, [12:28.000 --> 12:34.160] what we call the ABI version. So, it's really just a number. That increments are started [12:34.160 --> 12:39.760] at one, and then increments for each new set of features, which is added to the kernel. [12:39.760 --> 12:47.160] So, to give you an idea, it's really simple to get this ID version. It's with a landlock [12:47.160 --> 12:53.960] with a specific flag. So, yeah, it's a T code, but it's really simple. So, what we want [12:53.960 --> 13:00.080] to do at first, well, these four main properties. The first one is to be able to, well, to make [13:00.080 --> 13:08.400] it easy to use for developers, of course. So, we want something which is generic, which [13:08.400 --> 13:14.400] kind of follows the build-up pattern because, well, it's kind of common and easy to use. [13:14.400 --> 13:21.400] We want developers to focus on what they want to restrict, not the internal, well, implementation [13:21.400 --> 13:31.320] in the kernel. And we want them to gradually go from a coarse-grain access restriction [13:31.320 --> 13:39.040] to a fine-grain one. So, we don't want them to need to implement a fine-grain at first. [13:39.040 --> 13:44.560] It might be difficult, too difficult. So, yeah, in the same way that we can incrementally [13:44.560 --> 13:50.720] add new set of features, we can also incrementally restrict more and more of the time. So, no [13:50.720 --> 13:59.920] need to be super strict at first. And, yeah, it should be simpler to write, well, for the [13:59.920 --> 14:10.440] common cases. Okay. At first, the first improvement was to create group access rights. So, let's [14:10.440 --> 14:17.560] say you know which landlock version is supported by the running kernel. Let's say it's a second [14:17.560 --> 14:23.680] version. Then you can create a new root set which will get all the access rights which [14:23.680 --> 14:31.240] are supported by this basic kernel. So, you just call the end-of-access with XFS from [14:31.240 --> 14:37.840] all and then ABI2. And then you can do kind of the same when you're adding a new rule. [14:37.840 --> 14:43.440] And this time, well, you want to add an exception on the slash result to make it readable. So, [14:43.440 --> 14:50.280] in this case, there's two main groups, the from read and the from write. So, in, for [14:50.280 --> 14:56.080] example, the from read includes reading a file, but also reading a directory. So, listing [14:56.080 --> 15:05.080] a directory. Okay. Second property that we would like to have is being able to enforce [15:05.080 --> 15:11.360] a strict restriction. So, even if we don't know on which kernel the application will [15:11.360 --> 15:18.680] run, on some cases, we might want to be sure that all features are enforced and restricted. [15:18.680 --> 15:24.520] There's two use cases here. The first one is to test it. If you want to sandbox an application, [15:24.520 --> 15:29.200] you want to make sure that even if you're using all the sandboxing features, well, your [15:29.200 --> 15:34.280] application will work as expected. So, that's really important. And you don't want to run [15:34.280 --> 15:39.520] your application in an old kernel and kind of be fooled by the fact that your application [15:39.520 --> 15:45.000] is running because there's no, well, not all secret features are enabled. So, you want [15:45.000 --> 15:50.600] to cut these kind of issues in your CI. And also for security software, well, you want [15:50.600 --> 15:57.560] to have some security guarantees. So, you want to have a way to fold the whole sandboxing [15:57.560 --> 16:03.800] with all secret features that we embedded in our application. The third property is to [16:03.800 --> 16:08.840] be able to enforce the best for security with some minimal requirements. So, that's kind [16:08.840 --> 16:14.720] of the opposite. And this use case is mainly for end users because end user, well, you [16:14.720 --> 16:23.200] don't know which kernel they will use. And so, you want to be able to enforce an opportunistic [16:23.200 --> 16:30.760] sandboxing. So, if they have a new kernel, well, they will be more protected. If they [16:30.760 --> 16:36.560] have an old kernel, they might not be protected at all, but that's not your choice, that's [16:36.560 --> 16:41.880] not their choice. And at the end, they want to run your application anywhere. So, another [16:41.880 --> 16:47.960] requirement is to be able to disable the whole sandboxing if some features which are required [16:47.960 --> 16:53.560] may not be met. And this approach should be easier to write than others because it is [16:53.560 --> 17:01.320] the most common thing to do. And the last property is being able to run, well, to configure [17:01.320 --> 17:11.080] at runtime the sandboxing, but to make it in a way that you're running most of the codes. [17:11.080 --> 17:19.320] So, the idea is to be able to have kind of the same code running everywhere, almost, [17:19.320 --> 17:26.440] even if they don't have a recent kernel. Why that? Because you want to kind of identify [17:26.440 --> 17:36.600] early kind of some issues which might be linked to the sandboxing code and that's if you have, [17:36.600 --> 17:41.720] let's say, two users using a recent kernel and four users using an old kernel, well, [17:41.720 --> 17:47.400] you might want to test as much as possible with all your users, even if they don't have [17:47.400 --> 17:55.920] a new kernel. So, the first approach we took was, so we'll go quickly here, there's three [17:55.920 --> 18:02.040] approach. The first one was to change, well, to add a new method to the root set builder [18:02.040 --> 18:10.520] pattern. So, it was a simple method to set and set the best approach. So, if it was false, [18:10.520 --> 18:15.480] it was required to have this feature. So, in the example, an application that needed [18:15.480 --> 18:21.000] to move files from one directory to another needed to have the access effects refer access [18:21.000 --> 18:28.680] right to allow this access. And if it wasn't the case, well, the something should not be [18:28.680 --> 18:34.880] enforced, otherwise, it will break the application. So, that is a requirement. And in this case, [18:34.880 --> 18:40.840] that was a way to kind of change the state of the builder over time. So, this is kind [18:40.840 --> 18:46.880] of flexible, easy to understand, but some kind of cases. And, yeah, it makes the code [18:46.880 --> 18:53.560] not really clean. Another approach was kind of to do the same, but this time, with instead [18:53.560 --> 19:02.080] of two shifts, enable or disable, there were three ways to change it. The best-ifort way, [19:02.080 --> 19:06.880] the soft requirement and the hard requirement. So, a way to make it best-ifort, a way to [19:06.880 --> 19:13.160] make it error-out if there's any unsupported feature, and a way to disable the sandbox [19:13.160 --> 19:20.560] without error if some feature were not supported. So, that wasn't ideal, neither. And the last [19:20.560 --> 19:27.720] approach, which is currently working for us, is kind of a new one. So, the idea is to make [19:27.720 --> 19:35.400] it still configurable and to follow all these properties, but to make it, well, a bit simpler [19:35.400 --> 19:41.040] and still flexible. So, here, in a shell, well, you can make a new rule set that will [19:41.040 --> 19:45.640] error-out if there's any unsupported features, but at the same time, you can specify which [19:45.640 --> 19:51.640] feature is required to enable the sandbox or not. So, that's kind of specific, but, yeah, [19:51.640 --> 19:59.600] should be better. So, going forward, there's a lot going on in this first library. A lot [19:59.600 --> 20:05.240] to improve. You help, you help, you get a presentation, and I encourage you to, well, [20:05.240 --> 20:11.480] then make your application or others. And, well, there's some tips if you want to get [20:11.480 --> 20:15.000] some motivation here. It's a rewards program. So, thank you for attention. There's some [20:15.000 --> 20:35.120] interesting link here. This talk was kind of a dance, but I hope you enjoyed. Thank you.