[00:00.000 --> 00:16.640] Yeah, so my talk is about modifying the Rust compiler to support Cherry's hardware capabilities. [00:16.640 --> 00:18.880] I'm going to start off with a brief introduction. [00:18.880 --> 00:23.240] My name is Lewis Reville, and I work for a company called Embercosm. [00:23.240 --> 00:28.400] I work on many things, but I'd say I specialize in developing LLVM backends for constrained [00:28.400 --> 00:32.160] or unusual architectures. [00:32.160 --> 00:35.360] Embercosm itself is a software services company. [00:35.360 --> 00:40.040] We operate in the boundary between hardware and software, particularly in the embedded [00:40.040 --> 00:48.880] space where you can find many unusual, difficult and interesting problems like writing compilers. [00:48.880 --> 00:50.320] So what is Cherry? [00:50.320 --> 00:54.000] It's an acronym capability hardware enhanced risk instructions. [00:54.000 --> 00:58.520] It's best described as an instruction set extension, which can be adapted and applied [00:58.520 --> 01:00.720] to different architectures. [01:00.720 --> 01:06.120] The main feature of Cherry is that you can encode access constraints on memory addresses [01:06.120 --> 01:09.520] using things called capabilities. [01:09.520 --> 01:15.680] Capabilities essentially have metadata alongside memory addresses that allow you to specify [01:15.680 --> 01:19.120] these access constraints. [01:19.120 --> 01:25.360] These can only be operated on using capability operations, which replace the normal pointer [01:25.360 --> 01:32.920] operations, and these operations utilize the metadata to enforce those access constraints. [01:32.920 --> 01:36.800] It's worth pointing out there are two modes of operation for Cherry. [01:36.800 --> 01:43.920] There's pure cap mode where all pointers are capabilities, and in hybrid mode you have [01:43.920 --> 01:49.120] pointers by default on normal pointers, but capabilities are annotated as such in the source [01:49.120 --> 01:51.280] code. [01:51.280 --> 01:57.800] So capabilities together with capability operations allow you to enforce spatial, referential and [01:57.800 --> 02:02.280] temporal safety in the hardware at runtime. [02:02.280 --> 02:09.160] Spatial safety is to do with disallowing accesses out of bounds of an original allocation. [02:09.160 --> 02:16.920] Temporal safety is disallowing accesses without valid provenance, and temporal safety means [02:16.920 --> 02:25.080] that if the lifetime of an object is over, you can no longer access it through a capability. [02:25.080 --> 02:27.400] So what about integrating Cherry and Rust? [02:27.400 --> 02:32.600] Well, we're working on this as part of a project which is led by our customer Cyberhive. [02:32.600 --> 02:37.880] They're funded in turn by Digital Security by Design, which is a UK government initiative. [02:37.880 --> 02:44.520] Cyberhive want to use Cherry hardware to enhance secure network protocols that are written [02:44.520 --> 02:46.120] in Rust. [02:46.120 --> 02:51.080] So the goal for us then is to produce a Rust compiler that's capable of targeting Cherry-based [02:51.080 --> 02:56.880] architectures, with the long-term goal of a stable compiler that can produce production [02:56.880 --> 02:59.320] ready code for security purposes. [02:59.320 --> 03:06.320] We know that we're initially going to be targeting ARM's Morello platform. [03:06.320 --> 03:11.360] So other than being able to compile existing Rust code for Cherry, what's the motivation [03:11.360 --> 03:15.160] between integrating Cherry and Rust? [03:15.160 --> 03:18.960] Essentially it boils down to another layer of protection. [03:18.960 --> 03:24.520] We know that Rust is good at identifying and enforcing access constraints at compile [03:24.520 --> 03:30.480] time, but with Cherry you can identify constraints at compile time and enforce them in hardware [03:30.480 --> 03:32.400] at runtime. [03:32.400 --> 03:38.800] So a good example is that Rust code annotated with unsafe is often a necessity in many real [03:38.800 --> 03:46.480] world projects, which means that it could behave badly, but we don't know until runtime. [03:46.480 --> 03:52.320] With Cherry you can prevent this bad behavior in hardware when it occurs at runtime. [03:52.320 --> 03:56.800] There's some other small side benefits such as replacing slow software bounce checks with [03:56.800 --> 04:05.960] hardware bounce checking and replacing pointer plus length types with Cherry capabilities. [04:05.960 --> 04:10.240] So to make things more clear, I have a motivating example. [04:10.240 --> 04:16.320] So say we want to add a dynamic offset to a pointer and then load from that pointer. [04:16.320 --> 04:20.920] Well this needs to be done in an unsafe block because we don't know until runtime if it's [04:20.920 --> 04:23.120] going to do something bad. [04:23.120 --> 04:29.160] Without Cherry you could end up accessing out of range of your original allocated array, [04:29.160 --> 04:34.880] but with Cherry that access will not occur at runtime and the hardware will either panic [04:34.880 --> 04:42.280] or give you something, a default value. [04:42.280 --> 04:47.960] So now that we know that we want these benefits, how do we go about modifying Rust to get them? [04:47.960 --> 04:53.400] The main problem is that we need to account for capability sizes correctly, that is we [04:53.400 --> 04:59.800] need to stop assuming that pointer type size is equal to the addressable range of the pointer [04:59.800 --> 05:04.520] because capabilities have metadata, this isn't the case. [05:04.520 --> 05:12.840] Also in LLVM, in the Cherry LLVM fork capabilities are pointers in address space 200, whereas [05:12.840 --> 05:19.160] in Rust it seems like we assume that all pointers to data are in address space zero. [05:19.160 --> 05:23.480] Also if we want to support hybrid mode we need to be able to specify different pointer [05:23.480 --> 05:29.280] type sizes for different address spaces, so address space zero will have different sizes [05:29.280 --> 05:33.600] from address space 200. [05:33.600 --> 05:38.480] One thing I hope doesn't require many changes is that we need provenance and bounds to be [05:38.480 --> 05:43.360] propagated through the compiler because they need to be attached to capabilities. [05:43.360 --> 05:49.920] And of course if we want the optional bonus stuff we need to implement that as well. [05:49.920 --> 05:55.320] Progress so far, so the data layout changes are completed, which means that we can correctly [05:55.320 --> 06:01.560] specify capability sizes, both the type size and the addressable range for both pure cap [06:01.560 --> 06:05.120] and hybrid mode. [06:05.120 --> 06:10.880] I have modified APIs which produce pointer types to get rid of the assumption that pointers [06:10.880 --> 06:18.280] are in address space zero and now these APIs require an explicit address space parameter. [06:18.280 --> 06:25.280] And the biggest change is that for APIs where we have a, where we report a size for a type, [06:25.280 --> 06:32.960] this is replaced with a total type size and a size of the value that you can represent. [06:32.960 --> 06:40.600] And yeah, this means that, like I said before, we can support cherry capabilities. [06:40.600 --> 06:46.920] There's also in the strict provenance API there is an explicit unsafe method of producing [06:46.920 --> 06:53.480] pointers with no provenance from a U size. [06:53.480 --> 06:59.040] And for cherry we need to use cherry operations to set the address of a null capability to [06:59.040 --> 07:01.560] achieve the same result. [07:01.560 --> 07:05.520] What I'm currently working through is trawling through assertion failures that come up when [07:05.520 --> 07:11.600] building the core libraries with this modified compiler. [07:11.600 --> 07:14.960] What still needs to be done, well, there's almost definitely going to be modifications [07:14.960 --> 07:21.720] to the libraries to remove any assumptions that break for cherry. [07:21.720 --> 07:27.320] There's also the question of how do we specify capability types in hybrid mode and because [07:27.320 --> 07:33.880] I don't think that Rust annotations are the right tool to specify a specific pointer as [07:33.880 --> 07:39.160] being a capability, I think this requires a library solution. [07:39.160 --> 07:45.800] For APIs where I have replaced a size with a type size and added a size of the value [07:45.800 --> 07:51.800] that you can represent, we need to go through all of those uses of the type size and see [07:51.800 --> 07:55.160] if they should really be using the size of the value that you can represent because this [07:55.160 --> 08:01.320] is the main cause of the errors that I'm seeing in building the libraries. [08:01.320 --> 08:07.760] And of course, a lot of testing and polishing is going to be required. [08:07.760 --> 08:12.240] Before I finish this talk, I do need to mention that there's ongoing and past work that is [08:12.240 --> 08:13.240] in this same area. [08:13.240 --> 08:18.200] There was a master's thesis from the University of Cambridge and there's another government [08:18.200 --> 08:22.880] funded project from the University of Kent. [08:22.880 --> 08:25.080] And well, thank you for listening. [08:25.080 --> 08:28.760] Please feel free to check out the code on GitHub or ask me any questions outside.