[00:00.000 --> 00:12.960]  Hello, everyone. I'm Zahra. I work for Microsoft as a part of Linux Systems Group, LSU. This
[00:12.960 --> 00:19.760]  type is going to be about an ongoing project that we have on hardening Linux kernel with
[00:19.760 --> 00:26.280]  architecture capabilities. To be clear, this is not a product by Microsoft, so it's more
[00:26.280 --> 00:33.400]  like an exploratory project that we want to see if this hardware feature can be used
[00:33.400 --> 00:40.080]  for security issues that we want to fix on Linux kernel, and also as a part of this process,
[00:40.080 --> 00:47.680]  if we can find some new vulnerabilities, attack vectors, and things like that.
[00:47.680 --> 00:53.320]  So I'm going to start with a very brief background, like an intro on Cherry and a state of the
[00:53.320 --> 01:03.360]  world Linux, and then some of my work that capability-based hardening and future work
[01:03.360 --> 01:08.320]  and opportunities that I'm really hopeful that open source community can help us with that.
[01:08.320 --> 01:16.760]  So the big picture problem is that, as you all know, operating systems are really complex.
[01:17.720 --> 01:26.760]  We have millions of lines of code with a lot of complex abstractions, and basically any forms
[01:26.760 --> 01:32.560]  of proper hardening for a kernel is still an open problem. We have all kinds of software-based,
[01:32.560 --> 01:38.160]  like control flow integrity, like approaches, like compiler-based techniques for fuzzing,
[01:38.160 --> 01:45.360]  and all of these approaches are helpful, but we still see lots of vulnerabilities. Some of these
[01:45.360 --> 01:51.800]  are memory safety, like many of them are memory safety, but some are also logical problems because
[01:51.800 --> 01:58.560]  of this complex monolithic structure of the kernel. At the same time, Linux kernel also has
[01:58.560 --> 02:08.480]  different security subsystems. We have a combination of, for example, LSMs, DAC, sandboxing
[02:08.480 --> 02:17.360]  techniques for, for example, SECCOM, EVPF, and still also in this complex stack, the proper
[02:17.360 --> 02:22.400]  integration and hardening of these security subsystems themselves is not also a clear,
[02:22.400 --> 02:30.840]  we don't have a clear solution for that. At the same time, we also have a lot of ongoing hardware
[02:30.880 --> 02:39.080]  security features that are going, like, are adding to or suffer our hardware platforms. Like, for
[02:39.080 --> 02:43.920]  example, just on ARM, we have a combination of, like, core-screen privilege separation techniques,
[02:43.920 --> 02:50.280]  like trust zone T's. We have, like, more, like, at the same time, like, finer-grained memory
[02:50.280 --> 02:57.560]  safety features, like pointer authentication, memory tag extensions, and as we go, like,
[02:57.960 --> 03:03.880]  for example, modern hardware, we can have, like, for example, resource domain controllers, and all
[03:03.880 --> 03:09.960]  of these, like, hardware features are not really, so they are there, but our operating system doesn't
[03:09.960 --> 03:17.880]  really use them, like, in a fundamental way, like, in a basically principal approach. So this is the
[03:17.880 --> 03:25.920]  big picture of, like, lots of problems that we have for both hardening the kernel and also,
[03:25.960 --> 03:33.560]  like, using these hardware security features properly. And Cherry is one of, like, this fine-grained
[03:33.560 --> 03:40.760]  both for memory safety and for extensible compartmentalization features. That it has, like,
[03:40.760 --> 03:46.480]  a really old history from the University of Cambridge. I think, like, about 14, 15 years of,
[03:46.480 --> 03:53.800]  like, research is behind it. And the concept of, like, capability-based security models,
[03:53.960 --> 04:00.920]  it's, that concept is not new. We have it, like, even on, like, file descriptor, like, abstraction
[04:00.920 --> 04:08.360]  for Linux. So basically, having an unforeachable, like, token of authority that's, it's for, like,
[04:08.360 --> 04:16.840]  accessing any kind of, like, sensitive object. But the novelty of Cherry is that you have this
[04:16.840 --> 04:25.720]  hardware-software semantic approach for bringing this, like, concept, this, like, memory safety
[04:25.720 --> 04:33.560]  concept to, like, both your hardware architecture and, like, an instruction level and also, like,
[04:33.560 --> 04:39.400]  really have the opportunity to redesign your systems as, like, based on that. So they have,
[04:39.880 --> 04:48.200]  like, these extensions on MIPS, on RISC-5, and on, recently on ARM. And also, like, I think they
[04:48.200 --> 04:53.480]  complete, like, an example of their systems as, like, is based on previousity. So the Linux one is
[04:53.480 --> 05:02.840]  a new one that's mostly ARM, like, folks are working on. So what's Morello? Morello is,
[05:03.720 --> 05:11.160]  basically, the new development, like, experiments kind of, like, board for having, for adding Cherry
[05:11.160 --> 05:20.040]  to ARM V8. And it's extending, like, basically, the entire, like, instructions with, like, new
[05:20.040 --> 05:26.600]  registers and new sets of, like, and also extending, like, previous, like, systems registers for ARM.
[05:26.600 --> 05:35.320]  It's basically, like, introducing, like, the 129-bit pointers. So every Cherry pointer has,
[05:35.320 --> 05:43.000]  like, besides the value, it has the whole set of, like, metadata that contains its bond,
[05:43.000 --> 05:49.160]  boundary of, like, the memory region, the object type, and the permissions, like, basically, for
[05:49.240 --> 05:57.320]  any kind of, like, access to that pointer. And ARM also has this, like, added this notion of,
[05:57.320 --> 06:06.280]  like, controlled non-monoticity. That's basically, like, it's trying to also, because, as you know,
[06:06.280 --> 06:15.640]  like, ARM has, like, at least six execution levels for, like, EL0, EL1, EL2, and, like,
[06:15.640 --> 06:21.560]  at the same time, like, in the secure board. So somehow, this notion of, like, Moeller should
[06:21.560 --> 06:27.960]  be extended to all of these execution levels. So that's why they added this through, like,
[06:27.960 --> 06:35.480]  new exceptions sets, new executive and restrictive mode, privileged execution, and also some unsealing,
[06:35.480 --> 06:42.280]  like, operation that I'm going to describe later. So for every pointer, we have this
[06:42.280 --> 06:48.760]  permission set that, basically, by hardware, you can say, like, this, like, piece of memory should
[06:48.760 --> 06:55.320]  have what kind of, like, permission access. It can have, like, load, execute, just store,
[06:55.320 --> 07:00.680]  or even more complex, like, access controls, like, if you want to have it immutable, for example,
[07:00.680 --> 07:07.400]  like, region, like, through ceiling, or if you want to have even, like, systems-based, like, access
[07:07.960 --> 07:13.000]  controls, like, for, and software-defined, like, waste access controls, that if you want to have
[07:13.000 --> 07:21.080]  your own custom, basically, permissions to be defined for that, like, pointer, that capability.
[07:22.280 --> 07:27.640]  And the interesting thing is that, like, this system, like, accessing system registers that
[07:27.640 --> 07:35.400]  you can define to these capabilities, it's, the behavior is still, it's not, like, really affected
[07:35.400 --> 07:42.840]  by your, like, hypervisor mode, like, HVC calls, SMC calls to set security monitor, and also
[07:42.840 --> 07:54.280]  supervisor mode. So as you can see, besides, besides, like, the notion of, like, capabilities,
[07:54.280 --> 08:01.800]  you need to, like, change, basically, ARM had to change, like, several of, like, system registers,
[08:01.800 --> 08:07.960]  including the control registers. It, like, we have new registers, for example, for bounds, for
[08:07.960 --> 08:13.720]  setting, like, converting, like, pointer capabilities, like, we had to, we have to, like, have a new,
[08:14.840 --> 08:21.640]  like, PC, like, program counter, like, in some PCC, for example, instead of, like, PC, but at the same
[08:21.640 --> 08:31.000]  time, the execution levels for, like, EL0, EL1, EL2, and EL2, all of these should also be aware
[08:31.000 --> 08:35.960]  of the concept of capabilities. So most of, like, the control registers should also, like,
[08:36.920 --> 08:42.840]  they're also changed. And there are, like, you see, like, for example, CTLR, it's now, like,
[08:43.480 --> 08:50.680]  capability-based, like, CTLR. And this, this similar thing, for example, for, like,
[08:50.680 --> 08:56.120]  trade IDs, and things like that. So, for example, like, the neural Linux,
[08:56.120 --> 09:02.440]  like, trade structure tab, like, instead of, like, trade ID, like, traditional, it has, like, control
[09:02.440 --> 09:09.080]  capability-based trade IDs, or restrictive-based, like, trade IDs, that you can find most of these
[09:09.080 --> 09:16.040]  details in the technical manual. Similarly, as I said, like, we have a new set of exceptions
[09:16.040 --> 09:20.360]  for, basically, capability-based exceptions for any faults that you get from, like,
[09:21.160 --> 09:26.840]  permissions, like, accessing them, like, setting boundaries, like, right or wrong, so things like
[09:26.840 --> 09:33.240]  that. So, as I said, the whole details, it's, like, basically, a lot of details, so you can
[09:34.040 --> 09:41.000]  find them mostly on, like, Cherry site, and Morello, like, project, all of them, especially the arm
[09:41.000 --> 09:47.720]  one, it's, like, everything is, like, open, so you can, I'd really, like, if anybody in community
[09:47.720 --> 09:55.800]  to go and check this. So, about the state of Morello Linux, it's, the, the maintainers are
[09:55.800 --> 10:02.360]  most from ARM. They are really doing a very good job on, basically, in a very short time, they have
[10:02.360 --> 10:10.840]  a stable environment for Linux development. If you go, look at that, you see that, like, they're
[10:10.840 --> 10:17.720]  already, like, enabled most, most syscals. They're already, like, they have, like, distros, like,
[10:18.280 --> 10:24.760]  like, Debian, and they have both, like, even if you don't have, like, the development board,
[10:24.760 --> 10:31.960]  you can just, like, use their FVP, fixed virtualization platform, something that's basically
[10:31.960 --> 10:40.280]  an good emulator. And the whole system is, like, really, like, ready for experiment for both from
[10:40.280 --> 10:51.320]  the user space and the kernel development. Also, like, from their perspective, like, they modified
[10:51.320 --> 10:57.400]  most of, like, the main modifications of, like, memory management for adding capability-based
[10:57.400 --> 11:03.400]  architecture and things, like, that they're added. The main problems now that I'm going to discuss are
[11:04.040 --> 11:10.600]  from the security perspective, can be from the, like, the intersections of, like, user and security,
[11:10.600 --> 11:15.720]  user space and, like, the kernel space security, their interactions, their shared memory, and
[11:15.720 --> 11:24.520]  things like that. So, for example, in my experience, when, so, I first started with enabling some of the
[11:25.480 --> 11:32.600]  security features to more Linux, and the experiment was, like, really easy. I was just,
[11:33.160 --> 11:42.280]  trying to, like, for example, get the TE stack, like, TE driver running, adding, like, trusted keys,
[11:42.280 --> 11:50.280]  like, like, BVPF, like, checking, like, if the BVPF is working, like, properly on more low.
[11:50.920 --> 11:56.520]  And in most of the cases, when you want to add, like, these features to the more Linux,
[11:56.520 --> 12:04.120]  the, like, issues that I was seeing is, like, minor issues. So, basically, mostly, like,
[12:04.120 --> 12:11.000]  pointer mismatch, like, in the current architecture that they have, like, a pure capability-based
[12:11.000 --> 12:16.840]  ABI, most of the, like, so, basically, most of the issues coming from, like, when you enable these
[12:16.840 --> 12:24.520]  features, like, you have, like, traditional pointer abstractions that you need to convert
[12:24.520 --> 12:33.160]  them to capability-based abstraction. And, for example, like, when I was working on enabling
[12:33.160 --> 12:39.640]  the U-Axis, that, as you know, like, U-Axis is mostly from the Linux, but you have, like,
[12:39.640 --> 12:45.720]  all of the Linux abstraction for, like, function for communicating with the user space, like,
[12:45.720 --> 12:52.120]  passing pointers, passing shared memory, and things like that. And so, this required, like,
[12:52.120 --> 12:59.720]  low-level, like, I think, capability, like, instructions to Linux. And after we did that,
[12:59.720 --> 13:05.000]  like, changing the put user, like, get user, and things like that, basically, like, the kernel
[13:05.000 --> 13:11.320]  breaks, like, in several places. But the break was mostly on, like, okay, for example, here,
[13:11.880 --> 13:19.880]  on, I notify user, it says that, like, this pointer, like, it's a, like, integer user space,
[13:19.880 --> 13:27.240]  like, pointer. And so, it's not a capability. So, basically, we need to find out, like, dig out,
[13:27.240 --> 13:33.000]  like, what kind of, like, pointer is, it's, like, that, it's an address, or it's just, like, an
[13:33.000 --> 13:38.200]  integer pointer, and things like that. And try to use the abstract, like, tree abstraction to
[13:38.200 --> 13:45.480]  convert them, like, in a secure way, to capability. And then, like, it, for example, this one was,
[13:45.560 --> 13:53.160]  like, a kind of large patch that still needs more filling up. But, like, after fixing this, like,
[13:54.120 --> 14:00.920]  about 50 files, but in a very, like, a small, like, lines of code, you have, like,
[14:01.800 --> 14:10.440]  we have the user space, like, based, like, capability back-ends for Linux. But the tweaks are,
[14:10.440 --> 14:17.000]  like, actually, the good thing about this, this process is that you can find out, like, dig out,
[14:17.000 --> 14:22.920]  like, if there is a, like, basically, viability, like, if there were, at some, some of them,
[14:22.920 --> 14:30.520]  it was just not just a cherry, like, the pointer mismatch was, like, something that could, could
[14:31.160 --> 14:36.040]  provide, like, could be an issue, like, in the future, like, to have memory viability issues,
[14:36.040 --> 14:43.080]  or things like that, or if, for example, they had the boundaries right, and things that can go wrong,
[14:43.080 --> 14:53.080]  even, like, if we don't have cherry. So, the good news is, you have a lot of helper functions,
[14:53.080 --> 15:00.440]  like, from the compiler for, like, using cherry, getting, like, setting boundaries, and, like,
[15:00.520 --> 15:09.080]  converting capabilities to pointers and vice versa. So, that's good. The other thing that's,
[15:09.720 --> 15:16.200]  so, the current state of the model in Linux is that there's a main root capability that's,
[15:16.200 --> 15:22.280]  basically, every other capability is generated from that. So, what I'm working on is, basically,
[15:22.280 --> 15:28.280]  adding more, like, finer-growing, like, capabilities for both ceiling, like, making,
[15:29.240 --> 15:35.240]  especially, like, the sensitive, like, parts of the user space or the kernel immutable after,
[15:35.240 --> 15:41.640]  like, the operations are done. And, also, like, so, basically, we need to add more root capabilities
[15:41.640 --> 15:49.400]  for the user space, for, like, a specific capability for ceiling and making, like, both the kernel
[15:49.400 --> 16:01.320]  subsystems and user space subsystems immutable. And, also, we need to use better the concept of, like,
[16:01.320 --> 16:09.960]  this software-defined permissions on cherry. So, one of the, for example, custom, like,
[16:10.920 --> 16:19.960]  permissions that are added on free BSD is the permission on syscalls and permission on software,
[16:19.960 --> 16:27.400]  for example, virtual memory. So, this will, kind of, like, let them to define, like, and
[16:27.400 --> 16:34.680]  sandboxing the, like, environment, sandboxing abstractions that's, basically, backed by cherry.
[16:35.320 --> 16:40.280]  That's, like, this, this is, like, a really useful thing that, for example, can be, like,
[16:40.280 --> 16:49.320]  useful for EVPF or sitcom-based syscall filtering. So, the other thing that, like, we are working on
[16:49.320 --> 16:55.000]  is that, like, what's, basically, the better combinations of, like, this software-defined
[16:55.000 --> 17:00.840]  permissions for Linux and sandboxing. That's, basically, can, can get a lot of, like, feedback,
[17:00.920 --> 17:07.800]  it's usually useful to get feedback from EVPF guys, like, spokes and, like, sandboxing people
[17:07.800 --> 17:13.640]  that are working on sandboxing. So, see if, like, we can add these kind of, like, abstractions
[17:13.640 --> 17:28.120]  and integrate them properly to Linux security subsystems. So, as I said, like, most of the,
[17:28.120 --> 17:34.680]  like, the goal of this project at the end is, like, we want to use this hardware feature,
[17:34.680 --> 17:41.320]  like, similar hardware features, for protecting Linux security subsystems.
[17:41.320 --> 17:49.800]  LSMs, EVPF, and name-assets. So, basically, this is a very, like, open, like, area that
[17:49.800 --> 17:55.480]  it can be, like, really benefited from community and open-source community to be involved.
[17:56.200 --> 18:03.560]  And, besides that, one of the things that it's, like, it's, it's really, like, an
[18:03.560 --> 18:08.920]  earliest stage for, for it is that the whole systems are stacked from the hypervisor, from
[18:08.920 --> 18:14.840]  secure kernels and trusted execution environments. Now, for the first time, we have this option
[18:14.840 --> 18:21.880]  that we can, like, integrate fine-grained, like, memory protections and scalable, like,
[18:21.960 --> 18:28.360]  compartmentalization features to, for example, or trusted execution environment. And there's,
[18:28.360 --> 18:34.600]  like, a huge area of, like, attack vectors between the interactions of, like, this secure world and,
[18:34.600 --> 18:40.040]  like, the normal world and, like, the kernel environment and all of these, like, systems
[18:40.040 --> 18:47.400]  are stacked now have, like, a way to, like, to protect us from the secure RPC passing, like,
[18:48.040 --> 18:54.120]  pointers. That's basically the main attack vectors for all of, like, T, T environments.
[18:54.840 --> 19:01.880]  Now, we have the options to redesign these stacks based on these fine-grained security features.
[19:03.880 --> 19:13.880]  So, to summarize, the state of Linux, the Moerla Linux is really ready for, like,
[19:13.960 --> 19:20.280]  a special open-source community to get involved. And there are a lot of, like, basically open
[19:20.280 --> 19:28.600]  problems that we're looking forward to, like, get feedback from, especially, like, Linux community
[19:28.600 --> 19:35.080]  that are working on security subsystem to both, like, hardening the kernel itself, hardening the
[19:35.080 --> 19:41.960]  kernel security subsystems and, at the same time, adding more compartmentalization tools,
[19:42.680 --> 19:49.480]  sandboxing tools based on these kind of fine-grained features. Because at the end,
[19:49.480 --> 19:55.800]  the Linux, like, privilege separation is really coarse-grained. And most of the problems are,
[19:55.800 --> 20:01.640]  like, from, like, these huge, more intricate stacks can be solved if we have a more, like,
[20:01.640 --> 20:07.800]  better abstraction for privilege separation and compartmentalization. Also, from people,
[20:07.800 --> 20:15.800]  like, if they're working on debugging and tracing, that's also, like, a huge problem, like, open a
[20:15.800 --> 20:24.120]  space for capability-based systems, how we do it securely, how we basically do it properly. Now,
[20:24.120 --> 20:28.680]  you have these options that, for example, like, instead of the, you can define, for example,
[20:28.680 --> 20:36.280]  secure domains for, like, giving some capabilities to, for secure debugging. And then, for example,
[20:36.280 --> 20:45.480]  like, you don't need to be worried about, like, shutting down security, like, security, for example,
[20:45.480 --> 20:51.560]  secure boot or, like, security features of your system just to do debugging. Because, as you know,
[20:51.560 --> 21:02.840]  like, it's a, by nature, it's an insecure property. So, I'm happy to get any questions if you have.
[21:02.840 --> 21:06.840]  And let me know if you're interested in working on this project.
[21:09.000 --> 21:10.840]  Yeah, thank you for the talk.