[00:00.000 --> 00:15.720]  Hi, everyone. Yeah, my name is Miguel Sena, so I work for Microsoft and I'm mostly the
[00:15.720 --> 00:23.880]  main of Landlack, which is a new Schedule Linux feature. And yeah, it's about sunbathing.
[00:23.880 --> 00:32.360]  So this talk is about Rose Library. We wrote for Landlack and, well, we kind of had some
[00:32.360 --> 00:41.840]  changes about compatibility. So yeah, just quick introduction and context to understand
[00:41.840 --> 00:51.440]  the programmatic here. So yeah, why care about security? So here, well, it might be
[00:51.440 --> 00:58.320]  abuse for some, but like every application can be compromised. Every application can be
[00:58.320 --> 01:05.480]  trusted at first and during the lifetime of a process, it can, well, become malicious.
[01:05.480 --> 01:11.720]  So yeah, as developers, there's, well, multiple problems. So we don't want to participate to
[01:11.720 --> 01:19.520]  malicious actions performed by attackers through our software. And we kind of have a responsibility
[01:19.520 --> 01:27.480]  for users, especially to protect their personal data. And yeah, there's also the, well, there
[01:27.480 --> 01:34.400]  might be some issues about third-party code. So security is unboxing is a security approach
[01:34.400 --> 01:42.000]  to isolate software and mainly to isolate them by dropping ambient access rights. So
[01:42.000 --> 01:48.720]  in a shell, well, when you launch an application in, like, common in Existro, this application
[01:48.720 --> 01:54.240]  can access a lot of files, including some, which are kind of private, like.ssh, for
[01:54.240 --> 02:00.640]  example. So some mixing should not be confused with namespaces and containers, which is a
[02:00.640 --> 02:07.760]  way to create kind of a virtualized environment. And Seccom is also something which is really
[02:07.760 --> 02:12.800]  interesting for security purposes, but it's not about access control. It's about protecting
[02:12.800 --> 02:20.400]  the kernel. That was initially the, well, initial goal of Seccom. So Linux is really
[02:20.400 --> 02:28.080]  dedicated from the ground to bring some working features to Linux. So to bring some security
[02:28.080 --> 02:34.880]  features to the kernel. So it is an access control system available to every processes.
[02:34.880 --> 02:43.640]  You don't need to be a root or whatever. And it is designed to be embedded in applications.
[02:43.640 --> 02:51.160]  So to create built-in sandboxing. It's the way to create one or even multiple layers
[02:51.160 --> 03:00.080]  of new securities. So it comes kind of after all system-wide access control, which are
[03:00.080 --> 03:07.200]  already in place. And so it's available on most distros nowadays. And if it is not the
[03:07.200 --> 03:14.680]  case, well, I grant you to open an issue in your favorite distro. So about sandboxing
[03:14.680 --> 03:21.600]  here, what's the interesting point about sandboxing and built-in application security?
[03:21.600 --> 03:28.000]  If, well, that we can create tailored security policies and embedded them in the application.
[03:28.000 --> 03:37.640]  So there's interesting things about that. And that might help to make it security like
[03:37.640 --> 03:45.720]  invisible, which is kind of the main purpose here. We want to not bother users, but secure
[03:45.720 --> 03:53.400]  them anyway. So because these securities policy can be embedded in the application, well,
[03:53.400 --> 03:59.560]  it can use the application semantic. It can also use the application configuration transparently.
[03:59.560 --> 04:07.120]  So you don't need to add another configuration stuff. It's not another layer of execution.
[04:07.120 --> 04:12.520]  It's embedded in the application. And of course, well, if the configuration depends on user
[04:12.520 --> 04:19.920]  interaction, well, it can adapt to this change of behavior. And one really interesting point
[04:19.920 --> 04:27.720]  is, well, as developer, you want to test what you do. And you want to kind of get guarantees
[04:27.720 --> 04:34.760]  that whatever you're developing is still working. And being able to embed security policies
[04:34.760 --> 04:39.760]  in your application, make it possible to test them the same way that you can test every
[04:39.760 --> 04:45.200]  other features. So that's really interesting. You don't rely on, let's say, Selenix being
[04:45.200 --> 04:51.120]  installed on your test machine and so on. And it adapts to the application over time.
[04:51.120 --> 04:55.680]  So if you have, well, a CI, which is well configured, you can test it and make sure
[04:55.680 --> 05:01.960]  that, well, you can a bit, a bit add new features, updates the security policy and make sure that
[05:01.960 --> 05:09.520]  everything was as expected. So speaking about the library and the Rust library, so the idea
[05:09.520 --> 05:17.080]  was to create something which is Rusty, so identity to Rust. And for this, well, we wanted
[05:17.080 --> 05:24.960]  to leverage strong typing so to get some developing guarantees. And so to follow some common patterns.
[05:24.960 --> 05:32.840]  So many here, the builder pattern. So it's still a work in progress. It's working. But
[05:32.840 --> 05:39.480]  yeah, we're working on improving the API and make it easier and more, yeah, easy to use
[05:39.480 --> 05:46.440]  for competitive reasons. So this talk about these kind of compatibility requirements.
[05:46.440 --> 05:54.680]  And yeah, so I'll talk about that. Some example of early-period users listed here. But yeah,
[05:54.680 --> 06:04.040]  it's still in kind of beta. So let's start with some code example. So just as a warning,
[06:04.040 --> 06:10.720]  this kind of simplified code, it's working. But yeah, for the demo, well, it's on demo,
[06:10.720 --> 06:19.040]  but for this example, the idea is to make it simple to, well, to make it easier to understand.
[06:19.040 --> 06:24.800]  So you can see at the left, there's a C code. And at the right, the exact same semantic,
[06:24.800 --> 06:29.800]  but in Rust. So I will mostly talk about the Rust code. But yeah, you can take a look
[06:29.800 --> 06:36.160]  at the C code to kind of see the difference between them and how Rust can be useful there.
[06:36.160 --> 06:42.720]  So as I said, it is based on the builder pattern. So you create a rule set object here with
[06:42.720 --> 06:49.480]  a rule set new. And from there, you kind of call different methods to, well, build the
[06:49.480 --> 06:54.160]  object here. In this case, a root set. So a root set will contain a set of rules. And
[06:54.160 --> 06:58.640]  yeah, at first, you define what you want to enforce, what you want to restrict, what
[06:58.640 --> 07:04.800]  you want to deny by default. So in this case, these are two actions. The action to execute
[07:04.800 --> 07:11.080]  files and the action to write on files. So obviously, it's not enough. But in this case,
[07:11.080 --> 07:17.000]  it's easy to understand for the simple use case. And then, once you define the rule set
[07:17.000 --> 07:23.280]  and what the rule set can handle, well, you can create it. And the rule set creation translates
[07:23.280 --> 07:28.560]  to, you can see at the left, there's a London trade rule set. And this function is in fact
[07:28.560 --> 07:36.680]  a C school. So in the Rust part, when you call the create method, it creates a new rule
[07:36.680 --> 07:43.840]  set, which is backed underneath by a new file descriptor, dedicated to Larnock. And that
[07:43.840 --> 07:51.520]  is a wrap in the rule set object. Then, if you want to add rules to allow some directory
[07:51.520 --> 07:57.680]  to be, for example, executable, which is the case here. So, well, you open the slash user
[07:57.680 --> 08:06.800]  directory and you make it, well, executable. So, allow access, access execute. And then,
[08:06.800 --> 08:12.240]  you can add other rule you want for all the exception that should be legitimate for the,
[08:12.240 --> 08:19.040]  well, legitimate use case. And then, you restrict the current process. Well, in fact, the current
[08:19.040 --> 08:27.760]  thread. And from this point, the current thread can only execute files which are in slash
[08:27.760 --> 08:39.000]  user. And it cannot write anything at all, actually. So, that was an introduction, quick
[08:39.000 --> 08:46.200]  introduction to the library. And the thing is, Larnock is not a full feature access control
[08:46.200 --> 08:54.560]  yet because, well, it is complex. And, well, to reach this goal, well, we need to spend
[08:54.560 --> 09:04.160]  much more years to increment, well, to add new features to the link scale. Yeah. And
[09:04.160 --> 09:12.320]  the thing is, well, sometimes you might add new features that enable to restrict more.
[09:12.320 --> 09:19.840]  And sometimes we might add some features to restrict less. So, let's see what this means.
[09:19.840 --> 09:28.120]  So, the first version of Larnock, which was released with a 5.13 kernel, basically allowed
[09:28.120 --> 09:35.680]  to read, write, and do a lot of common stuff to restrict a lot of common files and actions.
[09:35.680 --> 09:42.280]  But there was, like you can see here, there's three categories. So, first one, always denied,
[09:42.280 --> 09:47.400]  was for the first version of Larnock, the actions that were always denied whenever you
[09:47.400 --> 09:56.880]  sandboxed a thread. So, that was for, well, complexity in the development, but also security
[09:56.880 --> 10:03.160]  reasons. So, for example, you are not able to execute set-ready binaries because it will
[10:03.160 --> 10:09.360]  be kind of a way to bypass the sandbox. And there was some restriction on Ptrace, so you're
[10:09.360 --> 10:15.680]  not allowed to debug an application process which is outside the sandbox. Obviously, it
[10:15.680 --> 10:21.440]  will be a way to get out of the sandbox. So, that's not what we want.
[10:21.440 --> 10:30.320]  So, the second version of Larnock had its new way, a new access write, which was a way
[10:30.320 --> 10:37.280]  to repound files. So, at first, it was denied to change the predatory of a file for security
[10:37.280 --> 10:44.280]  reasons because Larnock is based on five keys identification, and that was kind of complex.
[10:44.280 --> 10:52.640]  So, but the second version, we implemented that, and then it became configurable. So,
[10:52.640 --> 11:00.960]  one item less in the always denied box. In the third version of Larnock, so, all these
[11:00.960 --> 11:08.800]  versions are new kernel releases, and in the third version, we added a new way to restrict
[11:08.800 --> 11:18.000]  a file propagation. So, propagation in Larnock is to change the size of a file, and this was
[11:18.000 --> 11:23.840]  always allowed before because it wasn't endowed. It was a bit complex to implement this in
[11:23.840 --> 11:30.000]  the kernel at this time, but now it is possible. So, you can see that we can move items from
[11:30.000 --> 11:35.960]  the always denied box to the configurable and from the always allowed box to the configurable
[11:35.960 --> 11:43.720]  list. So, application compatibility. There's two main things in compatibility. It is forward
[11:43.720 --> 11:49.840]  compatibility in a way that when you update your kernel, you still can use the old kernel
[11:49.840 --> 11:55.760]  features. So, that's kind of common. And the backward compatibility in this case is, well,
[11:55.760 --> 12:00.520]  when you're using a kernel feature, well, you might need the specification of the kernel
[12:00.520 --> 12:09.040]  that supports this feature. And if your application is running its launch on an old kernel, well,
[12:09.040 --> 12:14.280]  that feature might be missing. And the thing is, when you're developing an application,
[12:14.280 --> 12:19.480]  well, you don't know on which kernel your application will run because, well, it's a
[12:19.480 --> 12:28.000]  user choice and a distro choice. What comes with landlock is the ability to get the landlock,
[12:28.000 --> 12:34.160]  what we call the ABI version. So, it's really just a number. That increments are started
[12:34.160 --> 12:39.760]  at one, and then increments for each new set of features, which is added to the kernel.
[12:39.760 --> 12:47.160]  So, to give you an idea, it's really simple to get this ID version. It's with a landlock
[12:47.160 --> 12:53.960]  with a specific flag. So, yeah, it's a T code, but it's really simple. So, what we want
[12:53.960 --> 13:00.080]  to do at first, well, these four main properties. The first one is to be able to, well, to make
[13:00.080 --> 13:08.400]  it easy to use for developers, of course. So, we want something which is generic, which
[13:08.400 --> 13:14.400]  kind of follows the build-up pattern because, well, it's kind of common and easy to use.
[13:14.400 --> 13:21.400]  We want developers to focus on what they want to restrict, not the internal, well, implementation
[13:21.400 --> 13:31.320]  in the kernel. And we want them to gradually go from a coarse-grain access restriction
[13:31.320 --> 13:39.040]  to a fine-grain one. So, we don't want them to need to implement a fine-grain at first.
[13:39.040 --> 13:44.560]  It might be difficult, too difficult. So, yeah, in the same way that we can incrementally
[13:44.560 --> 13:50.720]  add new set of features, we can also incrementally restrict more and more of the time. So, no
[13:50.720 --> 13:59.920]  need to be super strict at first. And, yeah, it should be simpler to write, well, for the
[13:59.920 --> 14:10.440]  common cases. Okay. At first, the first improvement was to create group access rights. So, let's
[14:10.440 --> 14:17.560]  say you know which landlock version is supported by the running kernel. Let's say it's a second
[14:17.560 --> 14:23.680]  version. Then you can create a new root set which will get all the access rights which
[14:23.680 --> 14:31.240]  are supported by this basic kernel. So, you just call the end-of-access with XFS from
[14:31.240 --> 14:37.840]  all and then ABI2. And then you can do kind of the same when you're adding a new rule.
[14:37.840 --> 14:43.440]  And this time, well, you want to add an exception on the slash result to make it readable. So,
[14:43.440 --> 14:50.280]  in this case, there's two main groups, the from read and the from write. So, in, for
[14:50.280 --> 14:56.080]  example, the from read includes reading a file, but also reading a directory. So, listing
[14:56.080 --> 15:05.080]  a directory. Okay. Second property that we would like to have is being able to enforce
[15:05.080 --> 15:11.360]  a strict restriction. So, even if we don't know on which kernel the application will
[15:11.360 --> 15:18.680]  run, on some cases, we might want to be sure that all features are enforced and restricted.
[15:18.680 --> 15:24.520]  There's two use cases here. The first one is to test it. If you want to sandbox an application,
[15:24.520 --> 15:29.200]  you want to make sure that even if you're using all the sandboxing features, well, your
[15:29.200 --> 15:34.280]  application will work as expected. So, that's really important. And you don't want to run
[15:34.280 --> 15:39.520]  your application in an old kernel and kind of be fooled by the fact that your application
[15:39.520 --> 15:45.000]  is running because there's no, well, not all secret features are enabled. So, you want
[15:45.000 --> 15:50.600]  to cut these kind of issues in your CI. And also for security software, well, you want
[15:50.600 --> 15:57.560]  to have some security guarantees. So, you want to have a way to fold the whole sandboxing
[15:57.560 --> 16:03.800]  with all secret features that we embedded in our application. The third property is to
[16:03.800 --> 16:08.840]  be able to enforce the best for security with some minimal requirements. So, that's kind
[16:08.840 --> 16:14.720]  of the opposite. And this use case is mainly for end users because end user, well, you
[16:14.720 --> 16:23.200]  don't know which kernel they will use. And so, you want to be able to enforce an opportunistic
[16:23.200 --> 16:30.760]  sandboxing. So, if they have a new kernel, well, they will be more protected. If they
[16:30.760 --> 16:36.560]  have an old kernel, they might not be protected at all, but that's not your choice, that's
[16:36.560 --> 16:41.880]  not their choice. And at the end, they want to run your application anywhere. So, another
[16:41.880 --> 16:47.960]  requirement is to be able to disable the whole sandboxing if some features which are required
[16:47.960 --> 16:53.560]  may not be met. And this approach should be easier to write than others because it is
[16:53.560 --> 17:01.320]  the most common thing to do. And the last property is being able to run, well, to configure
[17:01.320 --> 17:11.080]  at runtime the sandboxing, but to make it in a way that you're running most of the codes.
[17:11.080 --> 17:19.320]  So, the idea is to be able to have kind of the same code running everywhere, almost,
[17:19.320 --> 17:26.440]  even if they don't have a recent kernel. Why that? Because you want to kind of identify
[17:26.440 --> 17:36.600]  early kind of some issues which might be linked to the sandboxing code and that's if you have,
[17:36.600 --> 17:41.720]  let's say, two users using a recent kernel and four users using an old kernel, well,
[17:41.720 --> 17:47.400]  you might want to test as much as possible with all your users, even if they don't have
[17:47.400 --> 17:55.920]  a new kernel. So, the first approach we took was, so we'll go quickly here, there's three
[17:55.920 --> 18:02.040]  approach. The first one was to change, well, to add a new method to the root set builder
[18:02.040 --> 18:10.520]  pattern. So, it was a simple method to set and set the best approach. So, if it was false,
[18:10.520 --> 18:15.480]  it was required to have this feature. So, in the example, an application that needed
[18:15.480 --> 18:21.000]  to move files from one directory to another needed to have the access effects refer access
[18:21.000 --> 18:28.680]  right to allow this access. And if it wasn't the case, well, the something should not be
[18:28.680 --> 18:34.880]  enforced, otherwise, it will break the application. So, that is a requirement. And in this case,
[18:34.880 --> 18:40.840]  that was a way to kind of change the state of the builder over time. So, this is kind
[18:40.840 --> 18:46.880]  of flexible, easy to understand, but some kind of cases. And, yeah, it makes the code
[18:46.880 --> 18:53.560]  not really clean. Another approach was kind of to do the same, but this time, with instead
[18:53.560 --> 19:02.080]  of two shifts, enable or disable, there were three ways to change it. The best-ifort way,
[19:02.080 --> 19:06.880]  the soft requirement and the hard requirement. So, a way to make it best-ifort, a way to
[19:06.880 --> 19:13.160]  make it error-out if there's any unsupported feature, and a way to disable the sandbox
[19:13.160 --> 19:20.560]  without error if some feature were not supported. So, that wasn't ideal, neither. And the last
[19:20.560 --> 19:27.720]  approach, which is currently working for us, is kind of a new one. So, the idea is to make
[19:27.720 --> 19:35.400]  it still configurable and to follow all these properties, but to make it, well, a bit simpler
[19:35.400 --> 19:41.040]  and still flexible. So, here, in a shell, well, you can make a new rule set that will
[19:41.040 --> 19:45.640]  error-out if there's any unsupported features, but at the same time, you can specify which
[19:45.640 --> 19:51.640]  feature is required to enable the sandbox or not. So, that's kind of specific, but, yeah,
[19:51.640 --> 19:59.600]  should be better. So, going forward, there's a lot going on in this first library. A lot
[19:59.600 --> 20:05.240]  to improve. You help, you help, you get a presentation, and I encourage you to, well,
[20:05.240 --> 20:11.480]  then make your application or others. And, well, there's some tips if you want to get
[20:11.480 --> 20:15.000]  some motivation here. It's a rewards program. So, thank you for attention. There's some
[20:15.000 --> 20:35.120]  interesting link here. This talk was kind of a dance, but I hope you enjoyed. Thank you.