[00:00.000 --> 00:08.000] So before the 35 minutes, you mean? [00:08.000 --> 00:11.000] So you have your schedule to stop at 14.45. [00:11.000 --> 00:13.000] So you give me a give me a heads up. [00:13.000 --> 00:15.000] You can do it at 35. [00:15.000 --> 00:16.000] OK, yes. [00:16.000 --> 00:17.000] Sounds good. [00:17.000 --> 00:18.000] OK. [00:22.000 --> 00:24.000] All right, people. [00:24.000 --> 00:26.000] Let's do the second vlog today. [00:26.000 --> 00:27.000] I'm very happy. [00:27.000 --> 00:29.000] I'm actually excited about this talk. [00:29.000 --> 00:31.000] So Samuel from the RISC-5 company [00:31.000 --> 00:34.000] will talk about what's going on in the RISC-5 landscape. [00:34.000 --> 00:37.000] And I think, yeah, I'm excited for the next big step [00:37.000 --> 00:39.000] in our community, right? [00:39.000 --> 00:41.000] From open source software, open source hardware. [00:41.000 --> 00:42.000] So take that away, Samuel. [00:42.000 --> 00:43.000] Thank you. [00:43.000 --> 00:44.000] Thank you. [00:44.000 --> 00:45.000] So yeah, I'm Samuel. [00:45.000 --> 00:47.000] I work for a company called RISC-5. [00:47.000 --> 00:51.000] It's a startup that does RISC-5 things. [00:51.000 --> 00:54.000] And today I'm going to talk about confidential computing [00:54.000 --> 00:55.000] with RISC-5. [00:55.000 --> 00:59.000] And how do we do want to implement, well, [00:59.000 --> 01:02.000] an open source implementation of confidential computing? [01:02.000 --> 01:06.000] The previous talks I've mentioned are things like OPTIE. [01:06.000 --> 01:09.000] Some of them I've mentioned things like SGX or SCV. [01:09.000 --> 01:11.000] Those are all hardware implementation [01:11.000 --> 01:15.000] of the security attributes that the first talks about. [01:15.000 --> 01:18.000] Confidentiality, protection of memory, [01:18.000 --> 01:21.000] confidentiality of data in use. [01:21.000 --> 01:24.000] And this talk is really about how we want to achieve that [01:24.000 --> 01:25.000] with RISC-5. [01:25.000 --> 01:27.000] And the difference between the RISC-5 implementation [01:27.000 --> 01:29.000] and all other existing implementations [01:29.000 --> 01:31.000] is that everything is done in the open. [01:31.000 --> 01:33.000] Everything is open source. [01:33.000 --> 01:36.000] And everyone here in that room is free to come and help [01:36.000 --> 01:38.000] and contribute to that implementation. [01:38.000 --> 01:41.000] So that's why I think it's interesting. [01:41.000 --> 01:43.000] Hopefully I'm not wrong. [01:43.000 --> 01:45.000] OK. [01:45.000 --> 01:50.000] Who was on the RISC-5 dev room before? [01:50.000 --> 01:52.000] OK, so that's needed. [01:52.000 --> 01:55.000] RISC-5, what is RISC-5? [01:55.000 --> 01:58.000] RISC-5 is a free and open ISA, not open source ISA, [01:58.000 --> 02:00.000] because there's no source. [02:00.000 --> 02:03.000] It's an ISA, an instruction set architecture. [02:03.000 --> 02:05.000] So it's free. [02:05.000 --> 02:09.000] Everyone can use it, can build a CPU out of it [02:09.000 --> 02:12.000] without paying in license, any fees or anything like this. [02:12.000 --> 02:15.000] Actually, everyone is free to take half of the specification, [02:15.000 --> 02:17.000] implement some weirdos, CPU. [02:17.000 --> 02:19.000] It doesn't matter. [02:19.000 --> 02:22.000] You can take whatever you want out of this specification. [02:22.000 --> 02:25.000] And it's open in a sense that everything is defined in the open. [02:25.000 --> 02:29.000] So all the specs are frozen, that's ratified [02:29.000 --> 02:33.000] and accepted by the RISC-5 International Foundation. [02:33.000 --> 02:36.000] They're ratified and some modification can be added to it, [02:36.000 --> 02:38.000] but it's more difficult. [02:38.000 --> 02:40.000] But between the time they start to be specified [02:40.000 --> 02:42.000] and the time they are ratified, everything is open. [02:42.000 --> 02:43.000] So it's on GitHub. [02:43.000 --> 02:46.000] You can go and put some comments and some pull requests [02:46.000 --> 02:48.000] on CPU specifications. [02:48.000 --> 02:50.000] That are actually used in the real world. [02:50.000 --> 02:53.000] So it's quite interesting. [02:53.000 --> 02:59.000] And yeah, the specifications are released under an open source license. [02:59.000 --> 03:02.000] There are two volumes for the specification. [03:02.000 --> 03:04.000] It's fairly small. [03:04.000 --> 03:06.000] It's actually 300 pages, which is, I think, [03:06.000 --> 03:10.000] almost the same amount of pages that X86 uses [03:10.000 --> 03:13.000] for documenting the move instruction. [03:13.000 --> 03:18.000] So it's a good comparison. [03:18.000 --> 03:19.000] So yeah, it's very small. [03:19.000 --> 03:20.000] It's easy to read. [03:20.000 --> 03:22.000] Just go ahead and grab it. [03:22.000 --> 03:25.000] And yeah, the spec is split into the unprivileged [03:25.000 --> 03:26.000] and privileged specification. [03:26.000 --> 03:29.000] And I'm going to talk about this next. [03:29.000 --> 03:33.000] Why is the RISC-5 ISA interesting? [03:33.000 --> 03:35.000] So first of all, it's simple, as I just said. [03:35.000 --> 03:37.000] If you look at the specification, [03:37.000 --> 03:41.000] if you read the specification, there is no micro-architectural dependency. [03:41.000 --> 03:46.000] So the specification tells you how the ISA must look like. [03:46.000 --> 03:48.000] It doesn't tell you how it must be implemented. [03:48.000 --> 03:52.000] So everyone is free to go and implement the ISA the way they want. [03:52.000 --> 03:55.000] There is no dependency on a specific implementation. [03:55.000 --> 03:59.000] And probably this is why it's small, or actually smaller. [03:59.000 --> 04:04.000] It is modular, so it's the same specification for everyone. [04:04.000 --> 04:08.000] RISC-32, RISC-64, and it's the same implementation [04:08.000 --> 04:11.000] for the developer boards that you can find in the market [04:11.000 --> 04:15.000] and the upcoming like the Ventana, Multicore, [04:15.000 --> 04:18.000] SOCU actually massively Multicore, SOCs, it's the same spec. [04:18.000 --> 04:19.000] So it's modular. [04:19.000 --> 04:21.000] Everyone uses the same thing. [04:21.000 --> 04:22.000] And it's stable. [04:22.000 --> 04:27.000] So there's a base ISA and a set of standard extensions [04:27.000 --> 04:28.000] that are frozen. [04:28.000 --> 04:31.000] That means that you can rely on this to implement your UCPU [04:31.000 --> 04:35.000] and you'll be able to use whatever application are running [04:35.000 --> 04:36.000] and using those extensions. [04:36.000 --> 04:38.000] Those are frozen, they're not going to change. [04:38.000 --> 04:41.000] And if they change, they change the backward compatible way. [04:41.000 --> 04:43.000] And extensions are optional. [04:43.000 --> 04:45.000] So you don't have to implement all extensions [04:45.000 --> 04:48.000] to be called a RISC-5 CPU. [04:48.000 --> 04:51.000] And this here is the base ISA. [04:51.000 --> 04:53.000] So that's the entire base ISA. [04:53.000 --> 04:54.000] This is small. [04:54.000 --> 04:55.000] It's very small. [04:55.000 --> 04:57.000] It's easy to read. [04:57.000 --> 04:58.000] Oh, kind of. [04:58.000 --> 05:02.000] Not on that slide, but it's easy to read and it's small. [05:02.000 --> 05:06.000] I talked about the spec being split between privilege [05:06.000 --> 05:08.000] and unprivileged parts. [05:08.000 --> 05:11.000] And I'm going to talk about privilege mode, which is what is [05:11.000 --> 05:13.000] defined in the privilege specification. [05:13.000 --> 05:17.000] I'm going to talk about this because it's relevant, really [05:17.000 --> 05:20.000] relevant to the confidential computing implementation. [05:20.000 --> 05:26.000] So there are three basic privilege modes for a RISC-5 CPU [05:26.000 --> 05:27.000] to run on. [05:27.000 --> 05:30.000] The user mode, supervisor mode, and machine mode. [05:30.000 --> 05:35.000] And you switch between those modes through two mechanisms, [05:35.000 --> 05:37.000] actually through instructions. [05:37.000 --> 05:39.000] E-Call and M-Ret and S-Ret. [05:39.000 --> 05:43.000] So if you're in user mode, if your CPU is running in user mode, [05:43.000 --> 05:45.000] which is typically an application, [05:45.000 --> 05:49.000] you make an E-Call, which is a CIS-Call, basically. [05:49.000 --> 05:52.000] So to implement CIS-Call, you're going to use the E-Call instruction. [05:52.000 --> 05:55.000] And if you're in the kernel and you need firmware services, [05:55.000 --> 05:56.000] you're going to make another E-Call, [05:56.000 --> 05:59.000] and you go down in the privilege level and you're more privileged. [05:59.000 --> 06:04.000] To go back, to go up and move to a less privileged world, [06:04.000 --> 06:08.000] you're going to call M-Ret from the firmware world, [06:08.000 --> 06:09.000] from the machine mode. [06:09.000 --> 06:13.000] And you're going to call S-Ret to get back from a system call. [06:13.000 --> 06:18.000] And as I said, those mode actually maps to real use cases, [06:18.000 --> 06:20.000] what we typically use to. [06:20.000 --> 06:22.000] So the user mode is the application mode. [06:22.000 --> 06:25.000] Supervisor mode is where your kernel is going to run. [06:25.000 --> 06:31.000] And machine mode is where your firmware, EFI kind of thing, [06:31.000 --> 06:35.000] UFI kind of thing is going to run. [06:35.000 --> 06:39.000] One very important thing for the confidential computing implementation [06:39.000 --> 06:42.000] is the two additional modes. [06:42.000 --> 06:44.000] Actually, three additional modes that have been added [06:44.000 --> 06:46.000] with the hypervisor extension. [06:46.000 --> 06:50.000] So there is an extension to the base RIS5 ISA. [06:50.000 --> 06:54.000] It's called the H extension, H as in hypervisor. [06:54.000 --> 06:58.000] And this is an extension that's been added and is frozen. [06:58.000 --> 07:01.000] So it's something that is not going to change [07:01.000 --> 07:04.000] for supporting virtualization. [07:04.000 --> 07:07.000] So the mode that I've been adding is the HAS mode, [07:07.000 --> 07:09.000] the VS mode and the VU mode. [07:09.000 --> 07:12.000] So you can see in this diagram, [07:12.000 --> 07:16.000] you can run your application as usually in U mode. [07:16.000 --> 07:18.000] And then you're going to have your hypervisor, [07:18.000 --> 07:22.000] your host kernel when the extension is enabled, [07:22.000 --> 07:25.000] it's going to run not on S mode but on HAS mode. [07:25.000 --> 07:28.000] So hypervisor, supervisor mode. [07:28.000 --> 07:33.000] This is why your Linux KVM or Zen kind of thing are running. [07:33.000 --> 07:35.000] And then when you're going to create the virtual machine, [07:35.000 --> 07:37.000] the virtual machine is going to be split. [07:37.000 --> 07:40.000] If it's a full Linux virtual machine, [07:40.000 --> 07:42.000] it's going to be split into two different modes. [07:42.000 --> 07:45.000] The VU mode, the virtualized user mode [07:45.000 --> 07:47.000] and the virtualized supervisor mode. [07:47.000 --> 07:50.000] So your guest kernel is going to run in a virtualized supervisor mode [07:50.000 --> 07:54.000] and your guest applications are going to run in a virtualized user mode. [07:54.000 --> 07:56.000] Okay? [07:56.000 --> 07:58.000] All right. [07:58.000 --> 08:00.000] So confidential computing. [08:00.000 --> 08:04.000] I just did like a scratch course in five minutes of RISC-5. [08:04.000 --> 08:06.000] So I hope this makes sense. [08:06.000 --> 08:09.000] But anyways, I needed to do this to kind of explain [08:09.000 --> 08:12.000] where we want to go with confidential computing on RISC-5. [08:12.000 --> 08:16.000] So what we're defining currently in RISC-5 [08:16.000 --> 08:20.000] for confidential computing is called the AppTE RISC-5 specification. [08:20.000 --> 08:25.000] AppTE as in application, processor, trusted, execution environment. [08:25.000 --> 08:29.000] So it's a technical group where everything, again, is open. [08:29.000 --> 08:32.000] So there's a GitHub repo for this technical group. [08:32.000 --> 08:36.000] All specifications are there, the discussions, the minic nodes, everything. [08:36.000 --> 08:41.000] And it is not ratified yet, not frozen. [08:41.000 --> 08:43.000] So this is a work in progress. [08:43.000 --> 08:46.000] So again, feel free to come and join and help [08:46.000 --> 08:49.000] and provide some feedback on that specification. [08:49.000 --> 08:54.000] But it is aimed at becoming the reference [08:54.000 --> 08:57.000] confidential computing architecture for RISC-5. [08:57.000 --> 09:01.000] So it's currently in a pretty late state. [09:01.000 --> 09:07.000] It's going to be ratified, not ratified, but accepted pretty soon in a few months. [09:07.000 --> 09:13.000] But it's going to be the reference confidential computing architecture for RISC-5. [09:13.000 --> 09:15.000] It's not an ISA specification. [09:15.000 --> 09:23.000] So we don't add to the RISC-5 set of instruction and architectural definitions. [09:23.000 --> 09:25.000] But we do identify a few ISA gaps. [09:25.000 --> 09:29.000] For example, what we call the confidential memory attributes, [09:29.000 --> 09:34.000] which I'm going to talk about later. [09:34.000 --> 09:37.000] And just to clarify things, because we talked about OPTE, [09:37.000 --> 09:41.000] for example, there's an implementation of OPTE for RISC-5. [09:41.000 --> 09:48.000] The OPTE specification for RISC-5 is not aiming at the same set of use cases. [09:48.000 --> 09:53.000] OPTE is really trying to do and support the same use cases as TDX, [09:53.000 --> 09:59.000] for those who are familiar with TDX, or SCV, for those who are familiar with this AMD technology. [09:59.000 --> 10:02.000] And basically, this specification is defining a new class [10:02.000 --> 10:05.000] of trusted execution environment for RISC-5. [10:05.000 --> 10:08.000] And these new class are trusted virtual machines. [10:08.000 --> 10:11.000] So same as TDX, so same as SCV. [10:11.000 --> 10:16.000] The goal is really to run full-blown virtual machine in a confidential computing environment, [10:16.000 --> 10:24.000] where you will have memory and data confidentiality and integrity, as explained in the first talk. [10:24.000 --> 10:27.000] And the goal is really for people to take their existing workload, [10:27.000 --> 10:31.000] their existing virtual machine, their existing Kubernetes nodes, [10:31.000 --> 10:35.000] and move that into a confidential computing TE. [10:35.000 --> 10:41.000] The same way they're doing this, or they aim at doing this with SCV or TDX. [10:41.000 --> 10:44.000] So there are really two different set of use cases, [10:44.000 --> 10:51.000] and OPTE is aiming at this specific set of use cases. [10:51.000 --> 10:55.000] So there are a few architecture components that I'm going to talk about. [10:55.000 --> 11:01.000] An OPTE beats per heart, sorry, I didn't mention this, but a heart, [11:01.000 --> 11:06.000] HRT in RISC-5 terminology is actually a CPU core. [11:06.000 --> 11:08.000] It's a core, it's called a heart. [11:08.000 --> 11:10.000] There's a few components that I'm going to go through, [11:10.000 --> 11:13.000] the security manager, the TSM driver, [11:13.000 --> 11:16.000] there's a dependency on the hardware root of trust, [11:16.000 --> 11:19.000] and there's a structure, [11:19.000 --> 11:23.000] a non-ISA-specified structure called the memory tracking table. [11:23.000 --> 11:27.000] And to go through all these components and kind of explain what they are [11:27.000 --> 11:33.000] and how they're put together to reach the goal of memory and data protection [11:33.000 --> 11:38.000] and integrity guarantees when it's in use. [11:38.000 --> 11:43.000] I'm going to take an example of how from a call start of a RISC-5 SOC, [11:43.000 --> 11:46.000] we could actually build a trusted virtual machine [11:46.000 --> 11:51.000] with the confidential computing architecture that I'm trying to describe. [11:51.000 --> 12:00.000] Okay, so we have a RISC-5 SOC with a few components that are mandatory. [12:00.000 --> 12:05.000] We need an IOMMU, we need a root of trust, we need an MMU obviously. [12:05.000 --> 12:10.000] This is all dependent on the H extension on 64-bit RISC-5. [12:10.000 --> 12:15.000] It's basically RISC-5 GC, which is the general purpose specification, [12:15.000 --> 12:19.000] plus compressed, but we don't need compressed, it's just the G part. [12:19.000 --> 12:25.000] But yeah, it's a full-blown 64-bit RISC-5 SOC that's running there with an IOMMU. [12:25.000 --> 12:31.000] We do need and mandate the presence of a hardware root of trust [12:31.000 --> 12:34.000] and we need some sort of memory protection. [12:34.000 --> 12:38.000] So an MMU, a memory checker, something like this. [12:38.000 --> 12:42.000] The first thing that the root of trust is going to measure and load [12:42.000 --> 12:44.000] is called the TSM driver. [12:44.000 --> 12:48.000] So that's the first component of this confidential computing architecture. [12:48.000 --> 12:53.000] And the TSM driver is the component, the trusted component that runs in M mode, [12:53.000 --> 12:59.000] in thermal mode, that's going to split the world in non-confidential and confidential, okay? [12:59.000 --> 13:04.000] And the TSM driver is, yeah, a confidential world switcher, [13:04.000 --> 13:09.000] and it's the component that basically toggles a bit in the RISC-5 SOC, [13:09.000 --> 13:14.000] the apt-e bit, to tell if the heart is currently running in confidential mode [13:14.000 --> 13:16.000] or non-confidential mode. [13:16.000 --> 13:19.000] So there is apt-e bits that is part of the specification [13:19.000 --> 13:24.000] that tells at any point in time if a specific RISC-5 core, RISC-5 heart, [13:24.000 --> 13:28.000] is running in confidential mode or non-confidential mode. [13:28.000 --> 13:31.000] And the TSM driver is the component that's going to make that switch, [13:31.000 --> 13:34.000] is the component that is going to toggle that switch. [13:34.000 --> 13:37.000] So it's part of the TCB, it's a trusted component, [13:37.000 --> 13:41.000] it's a software trusted component, and that runs in M mode and does that. [13:41.000 --> 13:46.000] And basically, the TSM driver is going to switch from, [13:46.000 --> 13:48.000] for example, non-confidential to confidential, [13:48.000 --> 13:54.000] when something in non-confidential, like a VMM or KVM or your Linux kernel, [13:54.000 --> 13:59.000] is sending a specific TEE call, which is an E-call, [13:59.000 --> 14:04.000] basically a call that allows you to move from supervisor mode to machine mode, [14:04.000 --> 14:08.000] so basically from Linux kernel to TSM driver. [14:08.000 --> 14:13.000] The TSM driver is going to trap this, and then it's going to toggle the apt-e bit, [14:13.000 --> 14:17.000] which means it's going to atomically switch the CPU into confidential mode, [14:17.000 --> 14:21.000] and then it's going to move to something called the TSM, [14:21.000 --> 14:25.000] the trusted security manager, the TEE security manager, sorry. [14:25.000 --> 14:29.000] And to do that, it calls the MRET instruction and moves to TSM. [14:29.000 --> 14:32.000] So we are in the kernel, the kernel makes an E-call, [14:32.000 --> 14:37.000] the TSM driver toggles the CPU from non-confidential to confidential, [14:37.000 --> 14:42.000] and then starts running the TSM, and we're going to talk about the TSM next. [14:42.000 --> 14:48.000] And this is what the TSM driver is mostly about. [14:48.000 --> 14:51.000] The TSM driver, I'm going to talk about the TSM right after this, [14:51.000 --> 14:56.000] but the one very important thing that the TSM driver manages is called the memory tracking table. [14:56.000 --> 15:00.000] The memory tracking table is a piece of memory, [15:00.000 --> 15:04.000] and the structure of this memory tracking table is not specified [15:04.000 --> 15:08.000] in the confidential computing specification. [15:08.000 --> 15:15.000] It is up to any implementation to decide what it puts in this memory tracking table. [15:15.000 --> 15:20.000] What the specs tells is what this memory tracking table is for, [15:20.000 --> 15:22.000] and this is what I'm going to explain now. [15:22.000 --> 15:28.000] The memory tracking table is enforcing, and just to take back, [15:28.000 --> 15:31.000] the memory tracking table lives in confidential memory. [15:31.000 --> 15:35.000] So the memory tracking table lives in a piece of memory that is protected [15:35.000 --> 15:40.000] from the non-confidential world to actually see or temper with. [15:40.000 --> 15:45.000] So it's encrypted, protected, integrity-protected memory. [15:45.000 --> 15:52.000] So the memory tracking table enforces the confidentiality memory attribute [15:52.000 --> 15:55.000] for each and every page on the system. [15:55.000 --> 15:58.000] So it's what we call a PMA page tracker. [15:58.000 --> 16:03.000] So it defines if any memory page is confidential or not. [16:03.000 --> 16:08.000] So you take a physical address, you give that to the MTT, to the memory tracking table, [16:08.000 --> 16:15.000] and the MTT tells you if this address belongs to a confidential page or non-confidential page. [16:15.000 --> 16:19.000] So with this memory tracking table, anytime you want, for example, [16:19.000 --> 16:23.000] the non-confidential world is trying to access physically a page, [16:23.000 --> 16:27.000] the memory tracking table is going to be used by the CPU to actually check [16:27.000 --> 16:30.000] if this page is confidential or non-confidential. [16:30.000 --> 16:34.000] If you're trying to access a confidential page from a non-confidential world, [16:34.000 --> 16:40.000] if you're trying to read memory from your trusted virtual machine from your VMM, [16:40.000 --> 16:44.000] from your QMU, from your KVM, then the memory tracking table is going to tell you [16:44.000 --> 16:49.000] this is a confidential page, and that's going to generate a CPU fault. [16:49.000 --> 16:52.000] And it gives you memory protection. [16:52.000 --> 16:58.000] Depending on how you want to implement memory encryption, basically, to protect your memory, [16:58.000 --> 17:03.000] the memory tracking table will be able to tell you which key you need to use [17:03.000 --> 17:07.000] to encrypt or decrypt that physical page. [17:07.000 --> 17:12.000] And you can decide how you want to implement this, how many keys you want to support, [17:12.000 --> 17:16.000] if you want to add one key per TVM or multiple keys, [17:16.000 --> 17:24.000] or it's up to the micro-architectural implementation of the specification to decide what it does with it. [17:24.000 --> 17:28.000] Okay, so the TSM driver managed the memory tracking table, [17:28.000 --> 17:32.000] which gives us memory protection and integrity. [17:32.000 --> 17:37.000] And the next thing the TSM driver is going to do is going to load and measure the next component, [17:37.000 --> 17:40.000] the next trusted component that now runs in the last privileged mode, [17:40.000 --> 17:43.000] the TSM, the TE Security Manager. [17:43.000 --> 17:49.000] The TSM lives at the same level as the Linux kernel, as KVM, as the IPervisor, basically. [17:49.000 --> 17:52.000] But it lives in confidential work. [17:52.000 --> 17:55.000] It lives and runs out of confidential memory, [17:55.000 --> 18:02.000] and it's only run when the RIS5 CPU is running with the apti bit on, [18:02.000 --> 18:07.000] which means it's running when it's in confidential mode. [18:07.000 --> 18:11.000] So the TSM, I don't know if people here are familiar with TDX, [18:11.000 --> 18:18.000] but there are some similarities here for those who know TDX, unfortunately. [18:18.000 --> 18:22.000] So TSM, it's the TE Security Manager. [18:22.000 --> 18:28.000] It's a trusted piece between the host VMM and the TVM. [18:28.000 --> 18:32.000] So the TVM is a trusted virtual machine that we're trying to build through those steps. [18:32.000 --> 18:38.000] And nothing from the confidential world can actually touch a trusted virtual machine [18:38.000 --> 18:43.000] without going through the trusted, the TE Security Manager, the TSM. [18:43.000 --> 18:50.000] One very important thing that the TSM does is it manages all the second-stage page tables. [18:50.000 --> 18:58.000] So the page tables that allows you to translate TVM physical addresses to host physical addresses, [18:58.000 --> 19:03.000] those are managed by the TSM in confidential memory. [19:03.000 --> 19:10.000] So with the confidential computing implementation, KVM no longer manages the second-stage page tables [19:10.000 --> 19:12.000] for the trusted virtual machine. [19:12.000 --> 19:17.000] It's all handled by the TSM, which is trusted, in confidential memory. [19:17.000 --> 19:20.000] So that's a very important piece of TSM. [19:20.000 --> 19:24.000] And something really important to understand is that it is a passive component. [19:24.000 --> 19:29.000] So it implements security services that are going to be called by the host VMM. [19:29.000 --> 19:35.000] It doesn't run by itself. It's not something that schedules TVM or handles interrupts [19:35.000 --> 19:38.000] or it doesn't do anything like this. [19:38.000 --> 19:42.000] It just replies to security requests that are coming from the host. [19:42.000 --> 19:45.000] The host is in control of the machine. [19:45.000 --> 19:47.000] It's not in control of the trusted virtual machine. [19:47.000 --> 19:49.000] It needs to go through the TSM. [19:49.000 --> 19:54.000] And the TSM is only responsible for this, getting security requests from the host, [19:54.000 --> 19:57.000] from the host VMM, and replying to it. [19:57.000 --> 20:00.000] And we do have an open source implementation for this. [20:00.000 --> 20:04.000] So it's called Salus. It's on GitHub again. [20:04.000 --> 20:10.000] And it basically implements everything that I just described, plus a lot more different things. [20:10.000 --> 20:13.000] It's all in the specification and it's all open source. [20:13.000 --> 20:16.000] So go there. [20:16.000 --> 20:22.000] The TSM also manages the entity. [20:22.000 --> 20:26.000] So whenever the TSM adds a page to a trusted virtual machine, [20:26.000 --> 20:31.000] it's going to add entries to the entity and it's a little bit more complicated than this [20:31.000 --> 20:33.000] because it needs to go through the TSM driver. [20:33.000 --> 20:40.000] But basically the entity is something that is owned by the TSM driver and by the TSM. [20:40.000 --> 20:43.000] Okay, so TSM driver started. [20:43.000 --> 20:45.000] It loaded the TSM. [20:45.000 --> 20:50.000] At some point we have a host OS, a Linux kernel with KVM that starts. [20:50.000 --> 20:53.000] It puts some non-competential virtual machine. [20:53.000 --> 21:01.000] And at some point someone is going to be starting a trusted virtual machine, [21:01.000 --> 21:04.000] a virtual machine that runs in confidential world. [21:04.000 --> 21:11.000] And to do that, there's a set of ABI's between the host VMM on the left, [21:11.000 --> 21:14.000] the non-competential world, and the TSM. [21:14.000 --> 21:16.000] And that goes through the TSM driver. [21:16.000 --> 21:20.000] The TSM driver is the trusted piece that actually proxies each and every request [21:20.000 --> 21:25.000] from the non-competential world to the confidential world, to the TSM basically. [21:25.000 --> 21:32.000] And those are called the TE host ABI's because there are, it's a set of binary interfaces [21:32.000 --> 21:40.000] that are called from the host to actually manage and request security features from the TSM. [21:40.000 --> 21:42.000] Everything is proxied through the TSM driver. [21:42.000 --> 21:47.000] So the TSM driver traps the host sending E-calls, SBI calls, [21:47.000 --> 21:52.000] and basically it traps the calls from the host VMM, from KVM, for example, [21:52.000 --> 21:58.000] and it then schedules the TSM to actually run and handle those calls. [21:58.000 --> 22:03.000] So a few examples, creating and destroying a TVM context, [22:03.000 --> 22:09.000] converting confidential memory to, non-competential memory to confidential and vice versa, [22:09.000 --> 22:14.000] mapping pages from non-competential world to a TVM. [22:14.000 --> 22:19.000] All those security features, they are requested from the host VMM, from KVM, [22:19.000 --> 22:21.000] and they are managed by the TSM. [22:21.000 --> 22:27.000] So KVM itself, obviously we don't want KVM to actually take a page [22:27.000 --> 22:31.000] and add that to the TVM, a trusted virtual machine address space. [22:31.000 --> 22:36.000] It has to go through the TSM, which manages all the page tables for this TVM. [22:36.000 --> 22:41.000] And for example, if we want to create a TVM, [22:41.000 --> 22:44.000] which is what we're aiming or trying to do here, [22:44.000 --> 22:49.000] it goes through a few steps, and all those steps here map to an actual T... [22:49.000 --> 22:55.000] the host ABI, the ABA between KVM and the TSM, and there are basically seven steps. [22:55.000 --> 22:58.000] The first one is to create a TVM context. [22:58.000 --> 23:02.000] So KVM will ask for having a context so that it can use that context [23:02.000 --> 23:05.000] and then start configuring the TVM. [23:05.000 --> 23:12.000] The next thing a KVM needs to do is to allocate some memory from physical pages to the TSM [23:12.000 --> 23:18.000] so that the TSM can actually build the second-stage page tables for the TVM that it's going to create. [23:18.000 --> 23:21.000] Those second-stage page tables are living in confidential memory, [23:21.000 --> 23:26.000] so they cannot be handled, they must not be handled by KVM, by the host VMM. [23:26.000 --> 23:32.000] So KVM donates pages to TSM, and the TSM is going to use that to build those page tables. [23:32.000 --> 23:37.000] It's not meant to be used by the TVM memory, [23:37.000 --> 23:41.000] it's meant to actually track the second-stage page tables for the TVM. [23:41.000 --> 23:50.000] Then KVM is going to tell TSM that some memory region needs to be reserved for the TVM. [23:50.000 --> 23:53.000] So that's basically the TVM address space. [23:53.000 --> 23:59.000] And then KVM is going to allocate pages and move those pages from non-confidential to confidential [23:59.000 --> 24:08.000] and ask TSM to map those pages in the memory region that it just asked for creation in step number three. [24:08.000 --> 24:14.000] The last and next thing that KVM needs to do is to create TVM CPUs, [24:14.000 --> 24:22.000] because basically all the CPU state is contained and managed in confidential memory as well. [24:22.000 --> 24:27.000] All the CPU state that the TVM is going to run on top of is managed by the TSM in confidential memory [24:27.000 --> 24:34.000] so that KVM does not see ATVM general purpose registers values and cannot mess with it, obviously. [24:34.000 --> 24:37.000] So this is all handled by the TSM as well. [24:37.000 --> 24:44.000] And the KVM finalized the TVM and eventually asked TSM to start running the TVM. [24:44.000 --> 24:48.000] And this is where your TVM is starting to run off confidential memory [24:48.000 --> 24:56.000] with a VCPU which state is also kept in confidential memory and protected. [24:56.000 --> 24:58.000] So we have this. [24:58.000 --> 25:03.000] TSM just created a TVM upon the host VMM request. [25:03.000 --> 25:06.000] And the TVM can also talk back to the TSM. [25:06.000 --> 25:09.000] The TVM never talks back directly to the host VMM. [25:09.000 --> 25:12.000] It only talks back to the TSM. [25:12.000 --> 25:18.000] The same way a non-confidential VMM exit would be trapped by the host VMM. [25:18.000 --> 25:29.000] A confidential TVM VMM exit, for example, or any service that the confidential VMM needs will be managed by the TSM driver or the TSM. [25:29.000 --> 25:34.000] So there are a set of ABI's between the TVM and the TSM. [25:34.000 --> 25:42.000] And, for example, a thing that I didn't talk about, but attestation is something that is being requested by the TVM. [25:42.000 --> 25:45.000] So the TVM is going to ask for an attestation evidence. [25:45.000 --> 25:53.000] And this is going to be serviced by the TSM through those ABI's here between the TVM and the TSM. [25:53.000 --> 26:05.000] So the TVM asks for an attestation report, a signed attestation report, an evidence that is going to send to a lying party to run the full attestation dance whenever it wants to do that. [26:05.000 --> 26:14.000] And part of this specification, the confidential computing specification, defines how this attestation flow is going to be running. [26:14.000 --> 26:22.000] And, more importantly, how the attestation evidence is going to be built, out of which measurements, and how this is going to be formatted. [26:22.000 --> 26:30.000] Unlike TDX or SGX or SCV, we do use a standard format. [26:30.000 --> 26:34.000] We use X509 certificates for building an evidence. [26:34.000 --> 26:43.000] So each layer on the chain here from the hardware that will touch up to the TVM loads, measure, and certificates the next layer. [26:43.000 --> 26:47.000] So this is based on a specification called TCG DICE. [26:47.000 --> 26:51.000] It's a layered specification for building attestation evidence. [26:51.000 --> 26:55.000] And this is what we use with the RISV confidential computing implementation. [26:55.000 --> 27:03.000] Eventually, the TVM, when it asks for an attestation evidence, it will get a certificate from the TSM. [27:03.000 --> 27:12.000] So the TSM builds the certificate with the entire attestation evidence that is part of the certificate as an X509 exception. [27:12.000 --> 27:22.000] And this certificate is routed back all the way back to the hardware world trust for a relying party to then verify and attest or not. [27:22.000 --> 27:26.000] The last thing I want to talk about is IO. [27:26.000 --> 27:30.000] I didn't talk about IO because it's a chapter on its own. [27:30.000 --> 27:32.000] There are two kinds of virtual machine IO. [27:32.000 --> 27:37.000] There's the power virtualized IO, also known as virtual IO most of the time. [27:37.000 --> 27:46.000] Doing virtual IO with confidential computing, a confidential VM, TDX, SCV, or RISV is challenging [27:46.000 --> 27:51.000] because basically the virtual IO device implementation is done by the host VMM. [27:51.000 --> 27:58.000] So typically your virtual unit is going to be done by QMU or by an external process running out of the host user, for example. [27:58.000 --> 28:03.000] So you must share memory between your TVM and your host VMM. [28:03.000 --> 28:05.000] So it's complex. [28:05.000 --> 28:16.000] It's actually not very efficient because you need a software IO TLB and you need to do a buffer bouncing between confidential and non-confidential to be able to share stuff. [28:16.000 --> 28:23.000] You need to harden your guests so that you can actually somehow trust the host implementation, etc. [28:23.000 --> 28:25.000] So there's a lot of discussion around this. [28:25.000 --> 28:29.000] If you go to the Linux Cocoa mailing list, it's a Linux kernel mailing list. [28:29.000 --> 28:32.000] There's a lot of heated discussion right now. [28:32.000 --> 28:40.000] And the other IO, surprisingly, the other IO form is direct assignment. [28:40.000 --> 28:42.000] That is even more complex. [28:42.000 --> 28:52.000] Direct assignment basically means you take a PCI device that you don't know, that you know nothing about, and you add that to your TE trusted compute base. [28:52.000 --> 28:59.000] Basically you're going to say, I want my NVIDIA GPU to be part of my trusted virtual machine. [28:59.000 --> 29:06.000] And to do that, you basically need to attest and authenticate the device that you want to plug into your TVM. [29:06.000 --> 29:17.000] So there's a lot of specification, well, not a lot, but a few specifications, PCI specification called T-DISP and IDE for protecting the IDE link between your device and your TVM. [29:17.000 --> 29:19.000] You need collaboration from the IOMMU. [29:19.000 --> 29:21.000] It's a very complex topic. [29:21.000 --> 29:29.000] The first one, Vert IO1, is very much in progress. The direct assignment want, it's still being defined. [29:29.000 --> 29:34.000] So I rushed that through. I'm done. [29:34.000 --> 29:37.000] Thanks a lot for listening. I hope it was useful. [29:37.000 --> 29:39.000] Thank you so much. [29:39.000 --> 29:49.000] And I have time for questions.