[00:00.000 --> 00:08.000]  So before the 35 minutes, you mean?
[00:08.000 --> 00:11.000]  So you have your schedule to stop at 14.45.
[00:11.000 --> 00:13.000]  So you give me a give me a heads up.
[00:13.000 --> 00:15.000]  You can do it at 35.
[00:15.000 --> 00:16.000]  OK, yes.
[00:16.000 --> 00:17.000]  Sounds good.
[00:17.000 --> 00:18.000]  OK.
[00:22.000 --> 00:24.000]  All right, people.
[00:24.000 --> 00:26.000]  Let's do the second vlog today.
[00:26.000 --> 00:27.000]  I'm very happy.
[00:27.000 --> 00:29.000]  I'm actually excited about this talk.
[00:29.000 --> 00:31.000]  So Samuel from the RISC-5 company
[00:31.000 --> 00:34.000]  will talk about what's going on in the RISC-5 landscape.
[00:34.000 --> 00:37.000]  And I think, yeah, I'm excited for the next big step
[00:37.000 --> 00:39.000]  in our community, right?
[00:39.000 --> 00:41.000]  From open source software, open source hardware.
[00:41.000 --> 00:42.000]  So take that away, Samuel.
[00:42.000 --> 00:43.000]  Thank you.
[00:43.000 --> 00:44.000]  Thank you.
[00:44.000 --> 00:45.000]  So yeah, I'm Samuel.
[00:45.000 --> 00:47.000]  I work for a company called RISC-5.
[00:47.000 --> 00:51.000]  It's a startup that does RISC-5 things.
[00:51.000 --> 00:54.000]  And today I'm going to talk about confidential computing
[00:54.000 --> 00:55.000]  with RISC-5.
[00:55.000 --> 00:59.000]  And how do we do want to implement, well,
[00:59.000 --> 01:02.000]  an open source implementation of confidential computing?
[01:02.000 --> 01:06.000]  The previous talks I've mentioned are things like OPTIE.
[01:06.000 --> 01:09.000]  Some of them I've mentioned things like SGX or SCV.
[01:09.000 --> 01:11.000]  Those are all hardware implementation
[01:11.000 --> 01:15.000]  of the security attributes that the first talks about.
[01:15.000 --> 01:18.000]  Confidentiality, protection of memory,
[01:18.000 --> 01:21.000]  confidentiality of data in use.
[01:21.000 --> 01:24.000]  And this talk is really about how we want to achieve that
[01:24.000 --> 01:25.000]  with RISC-5.
[01:25.000 --> 01:27.000]  And the difference between the RISC-5 implementation
[01:27.000 --> 01:29.000]  and all other existing implementations
[01:29.000 --> 01:31.000]  is that everything is done in the open.
[01:31.000 --> 01:33.000]  Everything is open source.
[01:33.000 --> 01:36.000]  And everyone here in that room is free to come and help
[01:36.000 --> 01:38.000]  and contribute to that implementation.
[01:38.000 --> 01:41.000]  So that's why I think it's interesting.
[01:41.000 --> 01:43.000]  Hopefully I'm not wrong.
[01:43.000 --> 01:45.000]  OK.
[01:45.000 --> 01:50.000]  Who was on the RISC-5 dev room before?
[01:50.000 --> 01:52.000]  OK, so that's needed.
[01:52.000 --> 01:55.000]  RISC-5, what is RISC-5?
[01:55.000 --> 01:58.000]  RISC-5 is a free and open ISA, not open source ISA,
[01:58.000 --> 02:00.000]  because there's no source.
[02:00.000 --> 02:03.000]  It's an ISA, an instruction set architecture.
[02:03.000 --> 02:05.000]  So it's free.
[02:05.000 --> 02:09.000]  Everyone can use it, can build a CPU out of it
[02:09.000 --> 02:12.000]  without paying in license, any fees or anything like this.
[02:12.000 --> 02:15.000]  Actually, everyone is free to take half of the specification,
[02:15.000 --> 02:17.000]  implement some weirdos, CPU.
[02:17.000 --> 02:19.000]  It doesn't matter.
[02:19.000 --> 02:22.000]  You can take whatever you want out of this specification.
[02:22.000 --> 02:25.000]  And it's open in a sense that everything is defined in the open.
[02:25.000 --> 02:29.000]  So all the specs are frozen, that's ratified
[02:29.000 --> 02:33.000]  and accepted by the RISC-5 International Foundation.
[02:33.000 --> 02:36.000]  They're ratified and some modification can be added to it,
[02:36.000 --> 02:38.000]  but it's more difficult.
[02:38.000 --> 02:40.000]  But between the time they start to be specified
[02:40.000 --> 02:42.000]  and the time they are ratified, everything is open.
[02:42.000 --> 02:43.000]  So it's on GitHub.
[02:43.000 --> 02:46.000]  You can go and put some comments and some pull requests
[02:46.000 --> 02:48.000]  on CPU specifications.
[02:48.000 --> 02:50.000]  That are actually used in the real world.
[02:50.000 --> 02:53.000]  So it's quite interesting.
[02:53.000 --> 02:59.000]  And yeah, the specifications are released under an open source license.
[02:59.000 --> 03:02.000]  There are two volumes for the specification.
[03:02.000 --> 03:04.000]  It's fairly small.
[03:04.000 --> 03:06.000]  It's actually 300 pages, which is, I think,
[03:06.000 --> 03:10.000]  almost the same amount of pages that X86 uses
[03:10.000 --> 03:13.000]  for documenting the move instruction.
[03:13.000 --> 03:18.000]  So it's a good comparison.
[03:18.000 --> 03:19.000]  So yeah, it's very small.
[03:19.000 --> 03:20.000]  It's easy to read.
[03:20.000 --> 03:22.000]  Just go ahead and grab it.
[03:22.000 --> 03:25.000]  And yeah, the spec is split into the unprivileged
[03:25.000 --> 03:26.000]  and privileged specification.
[03:26.000 --> 03:29.000]  And I'm going to talk about this next.
[03:29.000 --> 03:33.000]  Why is the RISC-5 ISA interesting?
[03:33.000 --> 03:35.000]  So first of all, it's simple, as I just said.
[03:35.000 --> 03:37.000]  If you look at the specification,
[03:37.000 --> 03:41.000]  if you read the specification, there is no micro-architectural dependency.
[03:41.000 --> 03:46.000]  So the specification tells you how the ISA must look like.
[03:46.000 --> 03:48.000]  It doesn't tell you how it must be implemented.
[03:48.000 --> 03:52.000]  So everyone is free to go and implement the ISA the way they want.
[03:52.000 --> 03:55.000]  There is no dependency on a specific implementation.
[03:55.000 --> 03:59.000]  And probably this is why it's small, or actually smaller.
[03:59.000 --> 04:04.000]  It is modular, so it's the same specification for everyone.
[04:04.000 --> 04:08.000]  RISC-32, RISC-64, and it's the same implementation
[04:08.000 --> 04:11.000]  for the developer boards that you can find in the market
[04:11.000 --> 04:15.000]  and the upcoming like the Ventana, Multicore,
[04:15.000 --> 04:18.000]  SOCU actually massively Multicore, SOCs, it's the same spec.
[04:18.000 --> 04:19.000]  So it's modular.
[04:19.000 --> 04:21.000]  Everyone uses the same thing.
[04:21.000 --> 04:22.000]  And it's stable.
[04:22.000 --> 04:27.000]  So there's a base ISA and a set of standard extensions
[04:27.000 --> 04:28.000]  that are frozen.
[04:28.000 --> 04:31.000]  That means that you can rely on this to implement your UCPU
[04:31.000 --> 04:35.000]  and you'll be able to use whatever application are running
[04:35.000 --> 04:36.000]  and using those extensions.
[04:36.000 --> 04:38.000]  Those are frozen, they're not going to change.
[04:38.000 --> 04:41.000]  And if they change, they change the backward compatible way.
[04:41.000 --> 04:43.000]  And extensions are optional.
[04:43.000 --> 04:45.000]  So you don't have to implement all extensions
[04:45.000 --> 04:48.000]  to be called a RISC-5 CPU.
[04:48.000 --> 04:51.000]  And this here is the base ISA.
[04:51.000 --> 04:53.000]  So that's the entire base ISA.
[04:53.000 --> 04:54.000]  This is small.
[04:54.000 --> 04:55.000]  It's very small.
[04:55.000 --> 04:57.000]  It's easy to read.
[04:57.000 --> 04:58.000]  Oh, kind of.
[04:58.000 --> 05:02.000]  Not on that slide, but it's easy to read and it's small.
[05:02.000 --> 05:06.000]  I talked about the spec being split between privilege
[05:06.000 --> 05:08.000]  and unprivileged parts.
[05:08.000 --> 05:11.000]  And I'm going to talk about privilege mode, which is what is
[05:11.000 --> 05:13.000]  defined in the privilege specification.
[05:13.000 --> 05:17.000]  I'm going to talk about this because it's relevant, really
[05:17.000 --> 05:20.000]  relevant to the confidential computing implementation.
[05:20.000 --> 05:26.000]  So there are three basic privilege modes for a RISC-5 CPU
[05:26.000 --> 05:27.000]  to run on.
[05:27.000 --> 05:30.000]  The user mode, supervisor mode, and machine mode.
[05:30.000 --> 05:35.000]  And you switch between those modes through two mechanisms,
[05:35.000 --> 05:37.000]  actually through instructions.
[05:37.000 --> 05:39.000]  E-Call and M-Ret and S-Ret.
[05:39.000 --> 05:43.000]  So if you're in user mode, if your CPU is running in user mode,
[05:43.000 --> 05:45.000]  which is typically an application,
[05:45.000 --> 05:49.000]  you make an E-Call, which is a CIS-Call, basically.
[05:49.000 --> 05:52.000]  So to implement CIS-Call, you're going to use the E-Call instruction.
[05:52.000 --> 05:55.000]  And if you're in the kernel and you need firmware services,
[05:55.000 --> 05:56.000]  you're going to make another E-Call,
[05:56.000 --> 05:59.000]  and you go down in the privilege level and you're more privileged.
[05:59.000 --> 06:04.000]  To go back, to go up and move to a less privileged world,
[06:04.000 --> 06:08.000]  you're going to call M-Ret from the firmware world,
[06:08.000 --> 06:09.000]  from the machine mode.
[06:09.000 --> 06:13.000]  And you're going to call S-Ret to get back from a system call.
[06:13.000 --> 06:18.000]  And as I said, those mode actually maps to real use cases,
[06:18.000 --> 06:20.000]  what we typically use to.
[06:20.000 --> 06:22.000]  So the user mode is the application mode.
[06:22.000 --> 06:25.000]  Supervisor mode is where your kernel is going to run.
[06:25.000 --> 06:31.000]  And machine mode is where your firmware, EFI kind of thing,
[06:31.000 --> 06:35.000]  UFI kind of thing is going to run.
[06:35.000 --> 06:39.000]  One very important thing for the confidential computing implementation
[06:39.000 --> 06:42.000]  is the two additional modes.
[06:42.000 --> 06:44.000]  Actually, three additional modes that have been added
[06:44.000 --> 06:46.000]  with the hypervisor extension.
[06:46.000 --> 06:50.000]  So there is an extension to the base RIS5 ISA.
[06:50.000 --> 06:54.000]  It's called the H extension, H as in hypervisor.
[06:54.000 --> 06:58.000]  And this is an extension that's been added and is frozen.
[06:58.000 --> 07:01.000]  So it's something that is not going to change
[07:01.000 --> 07:04.000]  for supporting virtualization.
[07:04.000 --> 07:07.000]  So the mode that I've been adding is the HAS mode,
[07:07.000 --> 07:09.000]  the VS mode and the VU mode.
[07:09.000 --> 07:12.000]  So you can see in this diagram,
[07:12.000 --> 07:16.000]  you can run your application as usually in U mode.
[07:16.000 --> 07:18.000]  And then you're going to have your hypervisor,
[07:18.000 --> 07:22.000]  your host kernel when the extension is enabled,
[07:22.000 --> 07:25.000]  it's going to run not on S mode but on HAS mode.
[07:25.000 --> 07:28.000]  So hypervisor, supervisor mode.
[07:28.000 --> 07:33.000]  This is why your Linux KVM or Zen kind of thing are running.
[07:33.000 --> 07:35.000]  And then when you're going to create the virtual machine,
[07:35.000 --> 07:37.000]  the virtual machine is going to be split.
[07:37.000 --> 07:40.000]  If it's a full Linux virtual machine,
[07:40.000 --> 07:42.000]  it's going to be split into two different modes.
[07:42.000 --> 07:45.000]  The VU mode, the virtualized user mode
[07:45.000 --> 07:47.000]  and the virtualized supervisor mode.
[07:47.000 --> 07:50.000]  So your guest kernel is going to run in a virtualized supervisor mode
[07:50.000 --> 07:54.000]  and your guest applications are going to run in a virtualized user mode.
[07:54.000 --> 07:56.000]  Okay?
[07:56.000 --> 07:58.000]  All right.
[07:58.000 --> 08:00.000]  So confidential computing.
[08:00.000 --> 08:04.000]  I just did like a scratch course in five minutes of RISC-5.
[08:04.000 --> 08:06.000]  So I hope this makes sense.
[08:06.000 --> 08:09.000]  But anyways, I needed to do this to kind of explain
[08:09.000 --> 08:12.000]  where we want to go with confidential computing on RISC-5.
[08:12.000 --> 08:16.000]  So what we're defining currently in RISC-5
[08:16.000 --> 08:20.000]  for confidential computing is called the AppTE RISC-5 specification.
[08:20.000 --> 08:25.000]  AppTE as in application, processor, trusted, execution environment.
[08:25.000 --> 08:29.000]  So it's a technical group where everything, again, is open.
[08:29.000 --> 08:32.000]  So there's a GitHub repo for this technical group.
[08:32.000 --> 08:36.000]  All specifications are there, the discussions, the minic nodes, everything.
[08:36.000 --> 08:41.000]  And it is not ratified yet, not frozen.
[08:41.000 --> 08:43.000]  So this is a work in progress.
[08:43.000 --> 08:46.000]  So again, feel free to come and join and help
[08:46.000 --> 08:49.000]  and provide some feedback on that specification.
[08:49.000 --> 08:54.000]  But it is aimed at becoming the reference
[08:54.000 --> 08:57.000]  confidential computing architecture for RISC-5.
[08:57.000 --> 09:01.000]  So it's currently in a pretty late state.
[09:01.000 --> 09:07.000]  It's going to be ratified, not ratified, but accepted pretty soon in a few months.
[09:07.000 --> 09:13.000]  But it's going to be the reference confidential computing architecture for RISC-5.
[09:13.000 --> 09:15.000]  It's not an ISA specification.
[09:15.000 --> 09:23.000]  So we don't add to the RISC-5 set of instruction and architectural definitions.
[09:23.000 --> 09:25.000]  But we do identify a few ISA gaps.
[09:25.000 --> 09:29.000]  For example, what we call the confidential memory attributes,
[09:29.000 --> 09:34.000]  which I'm going to talk about later.
[09:34.000 --> 09:37.000]  And just to clarify things, because we talked about OPTE,
[09:37.000 --> 09:41.000]  for example, there's an implementation of OPTE for RISC-5.
[09:41.000 --> 09:48.000]  The OPTE specification for RISC-5 is not aiming at the same set of use cases.
[09:48.000 --> 09:53.000]  OPTE is really trying to do and support the same use cases as TDX,
[09:53.000 --> 09:59.000]  for those who are familiar with TDX, or SCV, for those who are familiar with this AMD technology.
[09:59.000 --> 10:02.000]  And basically, this specification is defining a new class
[10:02.000 --> 10:05.000]  of trusted execution environment for RISC-5.
[10:05.000 --> 10:08.000]  And these new class are trusted virtual machines.
[10:08.000 --> 10:11.000]  So same as TDX, so same as SCV.
[10:11.000 --> 10:16.000]  The goal is really to run full-blown virtual machine in a confidential computing environment,
[10:16.000 --> 10:24.000]  where you will have memory and data confidentiality and integrity, as explained in the first talk.
[10:24.000 --> 10:27.000]  And the goal is really for people to take their existing workload,
[10:27.000 --> 10:31.000]  their existing virtual machine, their existing Kubernetes nodes,
[10:31.000 --> 10:35.000]  and move that into a confidential computing TE.
[10:35.000 --> 10:41.000]  The same way they're doing this, or they aim at doing this with SCV or TDX.
[10:41.000 --> 10:44.000]  So there are really two different set of use cases,
[10:44.000 --> 10:51.000]  and OPTE is aiming at this specific set of use cases.
[10:51.000 --> 10:55.000]  So there are a few architecture components that I'm going to talk about.
[10:55.000 --> 11:01.000]  An OPTE beats per heart, sorry, I didn't mention this, but a heart,
[11:01.000 --> 11:06.000]  HRT in RISC-5 terminology is actually a CPU core.
[11:06.000 --> 11:08.000]  It's a core, it's called a heart.
[11:08.000 --> 11:10.000]  There's a few components that I'm going to go through,
[11:10.000 --> 11:13.000]  the security manager, the TSM driver,
[11:13.000 --> 11:16.000]  there's a dependency on the hardware root of trust,
[11:16.000 --> 11:19.000]  and there's a structure,
[11:19.000 --> 11:23.000]  a non-ISA-specified structure called the memory tracking table.
[11:23.000 --> 11:27.000]  And to go through all these components and kind of explain what they are
[11:27.000 --> 11:33.000]  and how they're put together to reach the goal of memory and data protection
[11:33.000 --> 11:38.000]  and integrity guarantees when it's in use.
[11:38.000 --> 11:43.000]  I'm going to take an example of how from a call start of a RISC-5 SOC,
[11:43.000 --> 11:46.000]  we could actually build a trusted virtual machine
[11:46.000 --> 11:51.000]  with the confidential computing architecture that I'm trying to describe.
[11:51.000 --> 12:00.000]  Okay, so we have a RISC-5 SOC with a few components that are mandatory.
[12:00.000 --> 12:05.000]  We need an IOMMU, we need a root of trust, we need an MMU obviously.
[12:05.000 --> 12:10.000]  This is all dependent on the H extension on 64-bit RISC-5.
[12:10.000 --> 12:15.000]  It's basically RISC-5 GC, which is the general purpose specification,
[12:15.000 --> 12:19.000]  plus compressed, but we don't need compressed, it's just the G part.
[12:19.000 --> 12:25.000]  But yeah, it's a full-blown 64-bit RISC-5 SOC that's running there with an IOMMU.
[12:25.000 --> 12:31.000]  We do need and mandate the presence of a hardware root of trust
[12:31.000 --> 12:34.000]  and we need some sort of memory protection.
[12:34.000 --> 12:38.000]  So an MMU, a memory checker, something like this.
[12:38.000 --> 12:42.000]  The first thing that the root of trust is going to measure and load
[12:42.000 --> 12:44.000]  is called the TSM driver.
[12:44.000 --> 12:48.000]  So that's the first component of this confidential computing architecture.
[12:48.000 --> 12:53.000]  And the TSM driver is the component, the trusted component that runs in M mode,
[12:53.000 --> 12:59.000]  in thermal mode, that's going to split the world in non-confidential and confidential, okay?
[12:59.000 --> 13:04.000]  And the TSM driver is, yeah, a confidential world switcher,
[13:04.000 --> 13:09.000]  and it's the component that basically toggles a bit in the RISC-5 SOC,
[13:09.000 --> 13:14.000]  the apt-e bit, to tell if the heart is currently running in confidential mode
[13:14.000 --> 13:16.000]  or non-confidential mode.
[13:16.000 --> 13:19.000]  So there is apt-e bits that is part of the specification
[13:19.000 --> 13:24.000]  that tells at any point in time if a specific RISC-5 core, RISC-5 heart,
[13:24.000 --> 13:28.000]  is running in confidential mode or non-confidential mode.
[13:28.000 --> 13:31.000]  And the TSM driver is the component that's going to make that switch,
[13:31.000 --> 13:34.000]  is the component that is going to toggle that switch.
[13:34.000 --> 13:37.000]  So it's part of the TCB, it's a trusted component,
[13:37.000 --> 13:41.000]  it's a software trusted component, and that runs in M mode and does that.
[13:41.000 --> 13:46.000]  And basically, the TSM driver is going to switch from,
[13:46.000 --> 13:48.000]  for example, non-confidential to confidential,
[13:48.000 --> 13:54.000]  when something in non-confidential, like a VMM or KVM or your Linux kernel,
[13:54.000 --> 13:59.000]  is sending a specific TEE call, which is an E-call,
[13:59.000 --> 14:04.000]  basically a call that allows you to move from supervisor mode to machine mode,
[14:04.000 --> 14:08.000]  so basically from Linux kernel to TSM driver.
[14:08.000 --> 14:13.000]  The TSM driver is going to trap this, and then it's going to toggle the apt-e bit,
[14:13.000 --> 14:17.000]  which means it's going to atomically switch the CPU into confidential mode,
[14:17.000 --> 14:21.000]  and then it's going to move to something called the TSM,
[14:21.000 --> 14:25.000]  the trusted security manager, the TEE security manager, sorry.
[14:25.000 --> 14:29.000]  And to do that, it calls the MRET instruction and moves to TSM.
[14:29.000 --> 14:32.000]  So we are in the kernel, the kernel makes an E-call,
[14:32.000 --> 14:37.000]  the TSM driver toggles the CPU from non-confidential to confidential,
[14:37.000 --> 14:42.000]  and then starts running the TSM, and we're going to talk about the TSM next.
[14:42.000 --> 14:48.000]  And this is what the TSM driver is mostly about.
[14:48.000 --> 14:51.000]  The TSM driver, I'm going to talk about the TSM right after this,
[14:51.000 --> 14:56.000]  but the one very important thing that the TSM driver manages is called the memory tracking table.
[14:56.000 --> 15:00.000]  The memory tracking table is a piece of memory,
[15:00.000 --> 15:04.000]  and the structure of this memory tracking table is not specified
[15:04.000 --> 15:08.000]  in the confidential computing specification.
[15:08.000 --> 15:15.000]  It is up to any implementation to decide what it puts in this memory tracking table.
[15:15.000 --> 15:20.000]  What the specs tells is what this memory tracking table is for,
[15:20.000 --> 15:22.000]  and this is what I'm going to explain now.
[15:22.000 --> 15:28.000]  The memory tracking table is enforcing, and just to take back,
[15:28.000 --> 15:31.000]  the memory tracking table lives in confidential memory.
[15:31.000 --> 15:35.000]  So the memory tracking table lives in a piece of memory that is protected
[15:35.000 --> 15:40.000]  from the non-confidential world to actually see or temper with.
[15:40.000 --> 15:45.000]  So it's encrypted, protected, integrity-protected memory.
[15:45.000 --> 15:52.000]  So the memory tracking table enforces the confidentiality memory attribute
[15:52.000 --> 15:55.000]  for each and every page on the system.
[15:55.000 --> 15:58.000]  So it's what we call a PMA page tracker.
[15:58.000 --> 16:03.000]  So it defines if any memory page is confidential or not.
[16:03.000 --> 16:08.000]  So you take a physical address, you give that to the MTT, to the memory tracking table,
[16:08.000 --> 16:15.000]  and the MTT tells you if this address belongs to a confidential page or non-confidential page.
[16:15.000 --> 16:19.000]  So with this memory tracking table, anytime you want, for example,
[16:19.000 --> 16:23.000]  the non-confidential world is trying to access physically a page,
[16:23.000 --> 16:27.000]  the memory tracking table is going to be used by the CPU to actually check
[16:27.000 --> 16:30.000]  if this page is confidential or non-confidential.
[16:30.000 --> 16:34.000]  If you're trying to access a confidential page from a non-confidential world,
[16:34.000 --> 16:40.000]  if you're trying to read memory from your trusted virtual machine from your VMM,
[16:40.000 --> 16:44.000]  from your QMU, from your KVM, then the memory tracking table is going to tell you
[16:44.000 --> 16:49.000]  this is a confidential page, and that's going to generate a CPU fault.
[16:49.000 --> 16:52.000]  And it gives you memory protection.
[16:52.000 --> 16:58.000]  Depending on how you want to implement memory encryption, basically, to protect your memory,
[16:58.000 --> 17:03.000]  the memory tracking table will be able to tell you which key you need to use
[17:03.000 --> 17:07.000]  to encrypt or decrypt that physical page.
[17:07.000 --> 17:12.000]  And you can decide how you want to implement this, how many keys you want to support,
[17:12.000 --> 17:16.000]  if you want to add one key per TVM or multiple keys,
[17:16.000 --> 17:24.000]  or it's up to the micro-architectural implementation of the specification to decide what it does with it.
[17:24.000 --> 17:28.000]  Okay, so the TSM driver managed the memory tracking table,
[17:28.000 --> 17:32.000]  which gives us memory protection and integrity.
[17:32.000 --> 17:37.000]  And the next thing the TSM driver is going to do is going to load and measure the next component,
[17:37.000 --> 17:40.000]  the next trusted component that now runs in the last privileged mode,
[17:40.000 --> 17:43.000]  the TSM, the TE Security Manager.
[17:43.000 --> 17:49.000]  The TSM lives at the same level as the Linux kernel, as KVM, as the IPervisor, basically.
[17:49.000 --> 17:52.000]  But it lives in confidential work.
[17:52.000 --> 17:55.000]  It lives and runs out of confidential memory,
[17:55.000 --> 18:02.000]  and it's only run when the RIS5 CPU is running with the apti bit on,
[18:02.000 --> 18:07.000]  which means it's running when it's in confidential mode.
[18:07.000 --> 18:11.000]  So the TSM, I don't know if people here are familiar with TDX,
[18:11.000 --> 18:18.000]  but there are some similarities here for those who know TDX, unfortunately.
[18:18.000 --> 18:22.000]  So TSM, it's the TE Security Manager.
[18:22.000 --> 18:28.000]  It's a trusted piece between the host VMM and the TVM.
[18:28.000 --> 18:32.000]  So the TVM is a trusted virtual machine that we're trying to build through those steps.
[18:32.000 --> 18:38.000]  And nothing from the confidential world can actually touch a trusted virtual machine
[18:38.000 --> 18:43.000]  without going through the trusted, the TE Security Manager, the TSM.
[18:43.000 --> 18:50.000]  One very important thing that the TSM does is it manages all the second-stage page tables.
[18:50.000 --> 18:58.000]  So the page tables that allows you to translate TVM physical addresses to host physical addresses,
[18:58.000 --> 19:03.000]  those are managed by the TSM in confidential memory.
[19:03.000 --> 19:10.000]  So with the confidential computing implementation, KVM no longer manages the second-stage page tables
[19:10.000 --> 19:12.000]  for the trusted virtual machine.
[19:12.000 --> 19:17.000]  It's all handled by the TSM, which is trusted, in confidential memory.
[19:17.000 --> 19:20.000]  So that's a very important piece of TSM.
[19:20.000 --> 19:24.000]  And something really important to understand is that it is a passive component.
[19:24.000 --> 19:29.000]  So it implements security services that are going to be called by the host VMM.
[19:29.000 --> 19:35.000]  It doesn't run by itself. It's not something that schedules TVM or handles interrupts
[19:35.000 --> 19:38.000]  or it doesn't do anything like this.
[19:38.000 --> 19:42.000]  It just replies to security requests that are coming from the host.
[19:42.000 --> 19:45.000]  The host is in control of the machine.
[19:45.000 --> 19:47.000]  It's not in control of the trusted virtual machine.
[19:47.000 --> 19:49.000]  It needs to go through the TSM.
[19:49.000 --> 19:54.000]  And the TSM is only responsible for this, getting security requests from the host,
[19:54.000 --> 19:57.000]  from the host VMM, and replying to it.
[19:57.000 --> 20:00.000]  And we do have an open source implementation for this.
[20:00.000 --> 20:04.000]  So it's called Salus. It's on GitHub again.
[20:04.000 --> 20:10.000]  And it basically implements everything that I just described, plus a lot more different things.
[20:10.000 --> 20:13.000]  It's all in the specification and it's all open source.
[20:13.000 --> 20:16.000]  So go there.
[20:16.000 --> 20:22.000]  The TSM also manages the entity.
[20:22.000 --> 20:26.000]  So whenever the TSM adds a page to a trusted virtual machine,
[20:26.000 --> 20:31.000]  it's going to add entries to the entity and it's a little bit more complicated than this
[20:31.000 --> 20:33.000]  because it needs to go through the TSM driver.
[20:33.000 --> 20:40.000]  But basically the entity is something that is owned by the TSM driver and by the TSM.
[20:40.000 --> 20:43.000]  Okay, so TSM driver started.
[20:43.000 --> 20:45.000]  It loaded the TSM.
[20:45.000 --> 20:50.000]  At some point we have a host OS, a Linux kernel with KVM that starts.
[20:50.000 --> 20:53.000]  It puts some non-competential virtual machine.
[20:53.000 --> 21:01.000]  And at some point someone is going to be starting a trusted virtual machine,
[21:01.000 --> 21:04.000]  a virtual machine that runs in confidential world.
[21:04.000 --> 21:11.000]  And to do that, there's a set of ABI's between the host VMM on the left,
[21:11.000 --> 21:14.000]  the non-competential world, and the TSM.
[21:14.000 --> 21:16.000]  And that goes through the TSM driver.
[21:16.000 --> 21:20.000]  The TSM driver is the trusted piece that actually proxies each and every request
[21:20.000 --> 21:25.000]  from the non-competential world to the confidential world, to the TSM basically.
[21:25.000 --> 21:32.000]  And those are called the TE host ABI's because there are, it's a set of binary interfaces
[21:32.000 --> 21:40.000]  that are called from the host to actually manage and request security features from the TSM.
[21:40.000 --> 21:42.000]  Everything is proxied through the TSM driver.
[21:42.000 --> 21:47.000]  So the TSM driver traps the host sending E-calls, SBI calls,
[21:47.000 --> 21:52.000]  and basically it traps the calls from the host VMM, from KVM, for example,
[21:52.000 --> 21:58.000]  and it then schedules the TSM to actually run and handle those calls.
[21:58.000 --> 22:03.000]  So a few examples, creating and destroying a TVM context,
[22:03.000 --> 22:09.000]  converting confidential memory to, non-competential memory to confidential and vice versa,
[22:09.000 --> 22:14.000]  mapping pages from non-competential world to a TVM.
[22:14.000 --> 22:19.000]  All those security features, they are requested from the host VMM, from KVM,
[22:19.000 --> 22:21.000]  and they are managed by the TSM.
[22:21.000 --> 22:27.000]  So KVM itself, obviously we don't want KVM to actually take a page
[22:27.000 --> 22:31.000]  and add that to the TVM, a trusted virtual machine address space.
[22:31.000 --> 22:36.000]  It has to go through the TSM, which manages all the page tables for this TVM.
[22:36.000 --> 22:41.000]  And for example, if we want to create a TVM,
[22:41.000 --> 22:44.000]  which is what we're aiming or trying to do here,
[22:44.000 --> 22:49.000]  it goes through a few steps, and all those steps here map to an actual T...
[22:49.000 --> 22:55.000]  the host ABI, the ABA between KVM and the TSM, and there are basically seven steps.
[22:55.000 --> 22:58.000]  The first one is to create a TVM context.
[22:58.000 --> 23:02.000]  So KVM will ask for having a context so that it can use that context
[23:02.000 --> 23:05.000]  and then start configuring the TVM.
[23:05.000 --> 23:12.000]  The next thing a KVM needs to do is to allocate some memory from physical pages to the TSM
[23:12.000 --> 23:18.000]  so that the TSM can actually build the second-stage page tables for the TVM that it's going to create.
[23:18.000 --> 23:21.000]  Those second-stage page tables are living in confidential memory,
[23:21.000 --> 23:26.000]  so they cannot be handled, they must not be handled by KVM, by the host VMM.
[23:26.000 --> 23:32.000]  So KVM donates pages to TSM, and the TSM is going to use that to build those page tables.
[23:32.000 --> 23:37.000]  It's not meant to be used by the TVM memory,
[23:37.000 --> 23:41.000]  it's meant to actually track the second-stage page tables for the TVM.
[23:41.000 --> 23:50.000]  Then KVM is going to tell TSM that some memory region needs to be reserved for the TVM.
[23:50.000 --> 23:53.000]  So that's basically the TVM address space.
[23:53.000 --> 23:59.000]  And then KVM is going to allocate pages and move those pages from non-confidential to confidential
[23:59.000 --> 24:08.000]  and ask TSM to map those pages in the memory region that it just asked for creation in step number three.
[24:08.000 --> 24:14.000]  The last and next thing that KVM needs to do is to create TVM CPUs,
[24:14.000 --> 24:22.000]  because basically all the CPU state is contained and managed in confidential memory as well.
[24:22.000 --> 24:27.000]  All the CPU state that the TVM is going to run on top of is managed by the TSM in confidential memory
[24:27.000 --> 24:34.000]  so that KVM does not see ATVM general purpose registers values and cannot mess with it, obviously.
[24:34.000 --> 24:37.000]  So this is all handled by the TSM as well.
[24:37.000 --> 24:44.000]  And the KVM finalized the TVM and eventually asked TSM to start running the TVM.
[24:44.000 --> 24:48.000]  And this is where your TVM is starting to run off confidential memory
[24:48.000 --> 24:56.000]  with a VCPU which state is also kept in confidential memory and protected.
[24:56.000 --> 24:58.000]  So we have this.
[24:58.000 --> 25:03.000]  TSM just created a TVM upon the host VMM request.
[25:03.000 --> 25:06.000]  And the TVM can also talk back to the TSM.
[25:06.000 --> 25:09.000]  The TVM never talks back directly to the host VMM.
[25:09.000 --> 25:12.000]  It only talks back to the TSM.
[25:12.000 --> 25:18.000]  The same way a non-confidential VMM exit would be trapped by the host VMM.
[25:18.000 --> 25:29.000]  A confidential TVM VMM exit, for example, or any service that the confidential VMM needs will be managed by the TSM driver or the TSM.
[25:29.000 --> 25:34.000]  So there are a set of ABI's between the TVM and the TSM.
[25:34.000 --> 25:42.000]  And, for example, a thing that I didn't talk about, but attestation is something that is being requested by the TVM.
[25:42.000 --> 25:45.000]  So the TVM is going to ask for an attestation evidence.
[25:45.000 --> 25:53.000]  And this is going to be serviced by the TSM through those ABI's here between the TVM and the TSM.
[25:53.000 --> 26:05.000]  So the TVM asks for an attestation report, a signed attestation report, an evidence that is going to send to a lying party to run the full attestation dance whenever it wants to do that.
[26:05.000 --> 26:14.000]  And part of this specification, the confidential computing specification, defines how this attestation flow is going to be running.
[26:14.000 --> 26:22.000]  And, more importantly, how the attestation evidence is going to be built, out of which measurements, and how this is going to be formatted.
[26:22.000 --> 26:30.000]  Unlike TDX or SGX or SCV, we do use a standard format.
[26:30.000 --> 26:34.000]  We use X509 certificates for building an evidence.
[26:34.000 --> 26:43.000]  So each layer on the chain here from the hardware that will touch up to the TVM loads, measure, and certificates the next layer.
[26:43.000 --> 26:47.000]  So this is based on a specification called TCG DICE.
[26:47.000 --> 26:51.000]  It's a layered specification for building attestation evidence.
[26:51.000 --> 26:55.000]  And this is what we use with the RISV confidential computing implementation.
[26:55.000 --> 27:03.000]  Eventually, the TVM, when it asks for an attestation evidence, it will get a certificate from the TSM.
[27:03.000 --> 27:12.000]  So the TSM builds the certificate with the entire attestation evidence that is part of the certificate as an X509 exception.
[27:12.000 --> 27:22.000]  And this certificate is routed back all the way back to the hardware world trust for a relying party to then verify and attest or not.
[27:22.000 --> 27:26.000]  The last thing I want to talk about is IO.
[27:26.000 --> 27:30.000]  I didn't talk about IO because it's a chapter on its own.
[27:30.000 --> 27:32.000]  There are two kinds of virtual machine IO.
[27:32.000 --> 27:37.000]  There's the power virtualized IO, also known as virtual IO most of the time.
[27:37.000 --> 27:46.000]  Doing virtual IO with confidential computing, a confidential VM, TDX, SCV, or RISV is challenging
[27:46.000 --> 27:51.000]  because basically the virtual IO device implementation is done by the host VMM.
[27:51.000 --> 27:58.000]  So typically your virtual unit is going to be done by QMU or by an external process running out of the host user, for example.
[27:58.000 --> 28:03.000]  So you must share memory between your TVM and your host VMM.
[28:03.000 --> 28:05.000]  So it's complex.
[28:05.000 --> 28:16.000]  It's actually not very efficient because you need a software IO TLB and you need to do a buffer bouncing between confidential and non-confidential to be able to share stuff.
[28:16.000 --> 28:23.000]  You need to harden your guests so that you can actually somehow trust the host implementation, etc.
[28:23.000 --> 28:25.000]  So there's a lot of discussion around this.
[28:25.000 --> 28:29.000]  If you go to the Linux Cocoa mailing list, it's a Linux kernel mailing list.
[28:29.000 --> 28:32.000]  There's a lot of heated discussion right now.
[28:32.000 --> 28:40.000]  And the other IO, surprisingly, the other IO form is direct assignment.
[28:40.000 --> 28:42.000]  That is even more complex.
[28:42.000 --> 28:52.000]  Direct assignment basically means you take a PCI device that you don't know, that you know nothing about, and you add that to your TE trusted compute base.
[28:52.000 --> 28:59.000]  Basically you're going to say, I want my NVIDIA GPU to be part of my trusted virtual machine.
[28:59.000 --> 29:06.000]  And to do that, you basically need to attest and authenticate the device that you want to plug into your TVM.
[29:06.000 --> 29:17.000]  So there's a lot of specification, well, not a lot, but a few specifications, PCI specification called T-DISP and IDE for protecting the IDE link between your device and your TVM.
[29:17.000 --> 29:19.000]  You need collaboration from the IOMMU.
[29:19.000 --> 29:21.000]  It's a very complex topic.
[29:21.000 --> 29:29.000]  The first one, Vert IO1, is very much in progress. The direct assignment want, it's still being defined.
[29:29.000 --> 29:34.000]  So I rushed that through. I'm done.
[29:34.000 --> 29:37.000]  Thanks a lot for listening. I hope it was useful.
[29:37.000 --> 29:39.000]  Thank you so much.
[29:39.000 --> 29:49.000]  And I have time for questions.