[00:00.000 --> 00:12.520]  We're going to start here, we have a presentation about installation, almost presentation for
[00:12.520 --> 00:16.640]  NADIS, how to manage Kubernetes, we think Kubernetes.
[00:16.640 --> 00:24.880]  The presenters are Mohit and Malta, Malta, so big shout out to them.
[00:24.880 --> 00:31.960]  Thank you. Yeah, we'd like to pick up where I think Nick in his initial presentation of
[00:31.960 --> 00:36.520]  saying we need to have like a let's encrypt movement, we need to make confidential computing
[00:36.520 --> 00:42.040]  a commodity, where he started off and then Mark knows I think had a great talk showing
[00:42.040 --> 00:46.720]  all those bits and pieces that we need to have to bring together like the use cases,
[00:46.720 --> 00:51.720]  the cloud native world, the way we develop applications for the cloud and the advantages
[00:51.720 --> 00:55.280]  that the confidential computing technology gives us, how to bring them together and where
[00:55.280 --> 01:03.800]  are those little gaps and historically seen or for different kind of use cases, I would
[01:03.800 --> 01:09.440]  roughly split from a use case perspective of if I want to develop an application, where
[01:09.440 --> 01:15.840]  can CC help me, how can I apply confidential computing, I can roughly split that in three
[01:15.840 --> 01:24.000]  tiers if you will, I think one is definitely managing keys, having enclaves that hold your
[01:24.000 --> 01:31.920]  cryptographic certificates, your keys that process the crypto operations for you, very
[01:31.920 --> 01:42.280]  small TCB, very small kind of application, right, and then the second one is where I
[01:42.280 --> 01:48.160]  package my entire application inside a enclave, inside a confidential container and I think
[01:48.160 --> 01:55.200]  that's what we've been doing lately a lot and then I think the third thing is what Mark
[01:55.200 --> 02:01.120]  has described is how can we bring that together, making this orchestratable, making this manageable,
[02:01.120 --> 02:08.680]  deployable and I think there are different ways of getting from the tier two or the way
[02:08.680 --> 02:15.760]  we are to here, one is I guess what Mark has described with taking containers, making them
[02:15.760 --> 02:21.640]  confidential containers and then having the problems with orchestration and an orthogonal
[02:21.640 --> 02:26.640]  approach that we like to present now is more of the idea of having confidential clusters,
[02:26.640 --> 02:32.880]  so instead of isolating individual containers, isolating the nodes, the downside probably
[02:32.880 --> 02:42.240]  is a little bit larger TCB and the advantage is being more closely to where we are right
[02:42.240 --> 02:47.680]  now with deploying and developing cloud native applications.
[02:47.680 --> 02:54.240]  Talking about challenges for level three, definitely I think one of the biggest ones
[02:54.240 --> 03:01.560]  is UI UX, right, there's little hope that people will go ahead and drastically adjust
[03:01.560 --> 03:08.160]  the way they deploy and develop applications for the cloud just because they want to use
[03:08.160 --> 03:14.280]  this new type of technology, so we need to get very close to where they are and bring
[03:14.280 --> 03:17.280]  those worlds together and then of course there are the challenges Mark has described
[03:17.280 --> 03:24.000]  with orchestration, orchestration, attesting, how can we attest all those different containers
[03:24.000 --> 03:30.800]  that are running in my cluster and don't necessarily want to verify each and individual instance
[03:30.800 --> 03:35.800]  of it, right, that could be a thousand and more of the same.
[03:35.800 --> 03:41.600]  And then once I have a cluster, all those day two operations of updating, upgrading
[03:41.600 --> 03:48.000]  and doing that in a sensitive way where I can always verify what's currently being running
[03:48.000 --> 03:56.080]  and what are the changes and yeah, big part of what we are going to present today is the
[03:56.080 --> 04:01.480]  right one here where a big benefit of the cloud is actually that I can give away some
[04:01.480 --> 04:08.000]  of this orchestrational work and I consume managed services that are operated by someone
[04:08.000 --> 04:17.880]  or autonomously and I just consume them through an API or any other kind of interface.
[04:17.880 --> 04:24.320]  So as Nick has said, infrastructure is rolling out, we see all those confidential technologies
[04:24.320 --> 04:33.120]  in the cloud, AMD, SCB, we have heard so many, many today, IBM, RISC-5, most of them give
[04:33.120 --> 04:39.520]  us a confidential VM, which is, as we've seen, not necessarily the abstraction we want,
[04:39.520 --> 04:46.160]  but still we can already consume managed Kubernetes that runs on confidential VMs, at least for
[04:46.160 --> 04:47.160]  the worker nodes.
[04:47.160 --> 04:52.000]  I think Azure has it, GCP has it, yeah.
[04:52.000 --> 04:57.160]  So this exists, but it's not really solving the problem, I mean, it gives us runtime encryption
[04:57.160 --> 05:04.480]  for the stuff that works on, lives on that nodes, but all the edges, all the IO is not
[05:04.480 --> 05:05.480]  protected, right?
[05:05.480 --> 05:09.960]  The API server is not protected, we've seen that in Magnus Talk, the metadata problem,
[05:09.960 --> 05:17.200]  the problem of the trusted control plane, the way if you want to consume persistent volumes,
[05:17.200 --> 05:21.560]  is that automatically encrypted or do you need to adjust my application to encrypt before
[05:21.560 --> 05:23.760]  writing to storage?
[05:23.760 --> 05:30.840]  So the idea of a confidential cluster is that I have somebody or something that fills in
[05:30.840 --> 05:34.880]  those gaps, so that I have, instead of those individual confidential VMs, I have one big
[05:34.880 --> 05:42.240]  context that I can verify through attestation, that I can establish a secret channel, and
[05:42.240 --> 05:49.120]  then if I'm in that context, if I'm in that cluster, I can just use Kubernetes as I used
[05:49.120 --> 05:57.720]  to, and from inside there, essentially everything is trusted, right?
[05:57.720 --> 06:05.040]  It's a different type of approach, it just creates an envelope around my Kubernetes and
[06:05.040 --> 06:08.080]  isolates that as a whole.
[06:08.080 --> 06:16.760]  As I said, I think UX and UI and the way we use this is super important, it's not gonna
[06:16.760 --> 06:21.200]  work that we need a lot of adjustments, a lot of additional steps in the development
[06:21.200 --> 06:27.040]  workflows, so having, this is just an example of constellation here, but having a simple
[06:27.040 --> 06:33.000]  way of creating such a confidential cluster and then using it is important, and all those
[06:33.000 --> 06:37.200]  things that I showed, all the challenges we need to solve below, we need to make this
[06:37.200 --> 06:39.920]  more or less invisible, right?
[06:39.920 --> 06:45.240]  In terms of constellation, we try to make the node operating system as verifiable as
[06:45.240 --> 06:50.880]  possible, strip it down as much as possible, harden it, then strip them together for a
[06:50.880 --> 06:58.000]  cluster, we need to think about supply chain, we need to think about how we can automatically
[06:58.000 --> 07:03.280]  encrypt all the stuff that goes over the network, that goes to the storage.
[07:03.280 --> 07:07.280]  Ideally this is all open source, so constellation, if I didn't have mentioned it's open source,
[07:07.280 --> 07:14.080]  and it's cloud agnostic, so it can run everywhere, and then for most of confidential computing
[07:14.080 --> 07:19.280]  stuff, I need some way of recovery, should things go, should things go south and everything
[07:19.280 --> 07:23.960]  is down and need to get back into running mode.
[07:23.960 --> 07:31.880]  So yeah, the big problem with the confidential cluster concept is now I can create a cluster
[07:31.880 --> 07:37.600]  and we will see in a bit of what that means, but if I can create a cluster, I have everything
[07:37.600 --> 07:43.280]  verified, now I have to maintain and run it on my own, and this is I guess the biggest
[07:43.280 --> 07:45.200]  problem with that concept, right?
[07:45.200 --> 07:49.800]  People want to consume managed stuff, when they have managed Kubernetes, don't want
[07:49.800 --> 07:57.480]  to run their own, orchestrate their own Kubernetes, but this is a big trade-off that people are
[07:57.480 --> 08:05.320]  facing, and yeah, we try to work on concepts of making that as easy as possible, and yeah,
[08:05.320 --> 08:14.840]  Malte is going to show you how.
[08:14.840 --> 08:17.520]  Yeah, so thanks for the introduction.
[08:17.520 --> 08:27.280]  So the basic idea that we had was how can we manage Kubernetes from inside Kubernetes
[08:27.280 --> 08:36.800]  itself, and to kind of draft this idea, I will start by explaining what typically you
[08:36.800 --> 08:44.480]  can do today, so on the left side you really have the traditional on-prem model, which
[08:44.480 --> 08:51.960]  is you have the whole cluster in your own hand, the control plane, the worker nodes,
[08:51.960 --> 08:55.960]  it runs on your own hardware, which is great for security, right?
[08:55.960 --> 09:02.920]  Because you have full control, but it also means you are responsible for every single
[09:02.920 --> 09:08.560]  interaction, like scaling up the cluster, joining the nodes, performing upgrades, both
[09:08.560 --> 09:15.480]  on the OS level and also Kubernetes upgrades, and then on the other side you have something
[09:15.480 --> 09:22.480]  that is super popular, it is just let the cloud provider deal with it, and it means
[09:22.480 --> 09:28.840]  the cloud provider can scale your cluster up and down, just if you have a burst of traffic
[09:28.840 --> 09:35.160]  coming in, you get new nodes, it is all super easy, you can set it up so that the cloud
[09:35.160 --> 09:41.080]  provider will automatically patch your operating system, it will automatically upgrade your
[09:41.080 --> 09:49.440]  Kubernetes, and this is great from a DevOps perspective, it is super simple, it scales,
[09:49.440 --> 09:58.680]  it takes away work from the developer and the operator of the cluster, so what we thought
[09:58.680 --> 10:05.560]  is why don't we meet in the middle, and that is kind of like we have to run our own control
[10:05.560 --> 10:12.440]  plane in the confidential context, but if we do this, we lose all of these smart features
[10:12.440 --> 10:20.520]  from the cloud provider, so we will just reinvent them but inside the cluster, that means we
[10:20.520 --> 10:27.440]  can still do auto scaling, we can still join the nodes by themselves without any human interaction,
[10:27.440 --> 10:34.200]  we can still roll out OS updates and we can even roll out Kubernetes upgrades inside a
[10:34.200 --> 10:42.760]  running Kubernetes cluster, so to explain how this works, I will first go on to how Kubernetes
[10:42.760 --> 10:51.440]  nodes and constellation can actually join the cluster without any outside interaction,
[10:51.440 --> 10:57.040]  and what you have to understand here is these are all confidential VMs and they make heavy
[10:57.040 --> 11:03.240]  use of the measured boot chain, I think we already had some good introductions on this,
[11:03.240 --> 11:08.640]  but I will still show you how this works in an example, so first in the confidential
[11:08.640 --> 11:13.120]  VM we have the firmware, and the firmware is basically just the first part that starts
[11:13.120 --> 11:24.320]  up, and the main task here is to load up the first stage boot loader and to measure it,
[11:24.320 --> 11:30.360]  so we measure it on AMD SEV in the approach we are currently doing, it is measured into
[11:30.360 --> 11:38.400]  a virtual TPM, and then we load this boot loader and then we will start executing it,
[11:38.400 --> 11:45.640]  and the boot loader just has the task of, in our case, loading the next stage and measuring
[11:45.640 --> 11:52.600]  it, which is a unified kernel image, and this is a very neat trick, it is basically just
[11:52.600 --> 11:59.200]  one blob that contains the Linux kernel and in it RAMFS and also the kernel command line,
[11:59.200 --> 12:06.200]  so the nice property here is we can measure all of these in one blob and don't have to
[12:06.200 --> 12:12.880]  take care of the individual components, which can be quite hard to do correctly, and inside
[12:12.880 --> 12:20.800]  of this, in the RAMFS, we will use the kernel command line to extract the hash that we expect
[12:20.800 --> 12:26.480]  for the root file system, and for this we use the Emverity, which I will not go into
[12:26.480 --> 12:32.760]  too much detail about this, it just allows us to have a read-only root file system and
[12:32.760 --> 12:41.400]  we know in advance that it has not been tampered with, and we can efficiently check every block
[12:41.400 --> 12:50.000]  while it is read and before it is actually given to the user land, so that's how we get
[12:50.000 --> 12:55.400]  to the root file system, and inside of this root file system we have a small application
[12:55.400 --> 13:01.880]  and the task of this application is to join this node into the Kubernetes cluster.
[13:01.880 --> 13:12.040]  So next to the completely unmodifiable OS, we have a state disk and the only task of
[13:12.040 --> 13:20.040]  the state disk is to have the data for Kubernetes itself, like container images and state at
[13:20.040 --> 13:27.320]  runtime that has to be stored on disk, and this is initialized to be completely clean,
[13:27.320 --> 13:36.360]  it's encrypted, and yeah, this is like a component we need to operate.
[13:36.360 --> 13:45.120]  So the next question is how do we make these things possible, and for this we deploy some
[13:45.120 --> 13:52.040]  microservices inside of Constellation, and these are the node operator, this is responsible
[13:52.040 --> 14:00.280]  for actually rolling out updates, it's the join service that attests nodes that are joining
[14:00.280 --> 14:05.240]  the cluster and decides if they are allowed to join or not, we also have a key service
[14:05.240 --> 14:10.720]  that is handling encryption keys, and yeah, some more that are not really important right
[14:10.720 --> 14:14.600]  now.
[14:14.600 --> 14:18.080]  So how does a node actually join the cluster?
[14:18.080 --> 14:23.640]  So I mentioned there's the bootstrapper that is started inside of the confidential virtual
[14:23.640 --> 14:31.280]  machine and it will autonomously search for the existing Kubernetes control plan and it
[14:31.280 --> 14:40.640]  will perform remote attestation using attested TLS, and it will basically use the attestation
[14:40.640 --> 14:50.880]  statement for example from AMD SCV, SNP, and this way, so the join service already knows
[14:50.880 --> 14:57.800]  what measurements to expect from a correct node that is running the expected software,
[14:57.800 --> 15:04.560]  so it can decide at this point if the booted node is running what you wanted to run and
[15:04.560 --> 15:07.440]  decide if it is allowed to join the cluster.
[15:07.440 --> 15:15.040]  So based on this, the join service can then offer the node a join token which allows it
[15:15.040 --> 15:20.480]  to join the cluster and it will also hand out a permanent encryption key for the state
[15:20.480 --> 15:23.760]  disk.
[15:23.760 --> 15:31.040]  So next we will have a quick look at how updates work and on a high level, we want the administrator
[15:31.040 --> 15:37.640]  to be in control, we don't want to give up complete control over the update process,
[15:37.640 --> 15:43.040]  but we want the actual execution to be completely automatic and seamless and we do this by basically
[15:43.040 --> 15:48.600]  just telling the cluster what to do and the rest is done by a Kubernetes operator which
[15:48.600 --> 15:59.800]  is a way to give in a desired state and let the cluster handle moving towards the state.
[15:59.800 --> 16:07.720]  And an important thing to think about here is we are running in the cloud environment
[16:07.720 --> 16:15.760]  and we don't want you to depend on individual nodes, this is also what GKE and EKS and others
[16:15.760 --> 16:23.000]  are doing, we are saying if you want to upgrade, we will give you a new node that has the desired
[16:23.000 --> 16:29.800]  configuration and we will never try to do updates in place.
[16:29.800 --> 16:33.120]  So how does the actual update work?
[16:33.120 --> 16:39.600]  We basically give in custom resources that describe the desired state, so the Kubernetes
[16:39.600 --> 16:48.200]  version and the OS image that we want to run on and some properties to actually verify
[16:48.200 --> 16:53.360]  like the expected measurements for the new image and tashes for the individual Kubernetes
[16:53.360 --> 17:02.280]  components and the operator reads this information and basically checks if the desired state matches
[17:02.280 --> 17:11.200]  reality and if it detects a mismatch, it will first stop any auto-scaling operations that
[17:11.200 --> 17:17.960]  are happening in the cluster and then it will start replacing the nodes one by one and for
[17:17.960 --> 17:23.800]  this we use the different APIs by the cloud providers.
[17:23.800 --> 17:28.720]  So in this case, we will just spawn a new node in the correct node group that has the
[17:28.720 --> 17:31.040]  desired configuration.
[17:31.040 --> 17:39.880]  We wait for the node to autonomously join the cluster and we wait for it to become ready.
[17:39.880 --> 17:45.720]  Next we will cordon and drain the node which just means we will safely move over your running
[17:45.720 --> 17:51.400]  workloads from this node to other nodes in the cluster and only if we are sure that your
[17:51.400 --> 17:58.720]  running workloads moved over, we will then remove the old node from the cluster and this
[17:58.720 --> 18:08.200]  is basically how you can get from having outdated nodes to having updated nodes and this will
[18:08.200 --> 18:12.000]  just go on until your whole cluster is up to date.
[18:12.000 --> 18:19.960]  You can also parallelize this and when this is done, you can just restart the auto-scaler
[18:19.960 --> 18:28.720]  and move on with your day.
[18:28.720 --> 18:34.320]  All right, quick conclusion.
[18:34.320 --> 18:41.040]  So in summary, the fundamental ideas, we take this confidential cluster concept, enveloping
[18:41.040 --> 18:48.920]  the entire Kubernetes cluster instead of protecting single containers or parts, where
[18:48.920 --> 18:54.680]  we gain is we basically get all the orchestration for free, we need to protect the edges, all
[18:54.680 --> 18:57.000]  the Ion and so forth.
[18:57.000 --> 19:04.000]  The downside is we can't isolate inside that cluster so it's one big envelope, of course.
[19:04.000 --> 19:08.280]  This works already, it's an open source tool, you can check out Constellation on GitHub
[19:08.280 --> 19:12.960]  and try it locally or on one of the big clouds.
[19:12.960 --> 19:21.880]  From a Kubernetes perspective, it's just vanilla Kubernetes so not surprising that it's certified.
[19:21.880 --> 19:25.920]  To give out some more references, if you're interested in this whole image part, there
[19:25.920 --> 19:34.160]  was the image-based Linux and TPM death room, there's a lot of talks on these topics, also
[19:34.160 --> 19:36.160]  very interesting.
[19:36.160 --> 19:39.720]  There's a, so this is the last talk here, but if you're interested in more confidential
[19:39.720 --> 19:45.680]  computing, sneak a little advertisement here for the OC3 Open Confidential Computing Conference
[19:45.680 --> 19:51.160]  that's going to happen in March, it's virtual free, you can just sign up and listen to the
[19:51.160 --> 19:52.160]  talks if you're interested.
[19:52.160 --> 19:56.360]  A bunch of the folks that were here think also have a talk there.
[19:56.360 --> 20:02.320]  Yeah, so yeah, if you have any questions, please feel free to get in touch and that's
[20:02.320 --> 20:03.320]  it.
[20:03.320 --> 20:10.320]  Thank you.
[20:10.320 --> 20:30.600]  Oh, so yeah.
[20:30.600 --> 20:36.200]  So the question was about the Attesta TLS, when we join nodes, we establish a seek connection
[20:36.200 --> 20:37.800]  based on Attesta TLS.
[20:37.800 --> 20:43.240]  Yes, so first of all, our implementation is open source, it's part of the consolation
[20:43.240 --> 20:45.000]  source on GitHub.
[20:45.000 --> 20:53.200]  I think it's nothing fancy, we use the AMD SCV or Intel TDX and so forth, remote attestation
[20:53.200 --> 21:01.800]  statement to exchange a key as part of the data that's sent over.
[21:01.800 --> 21:07.600]  And we bind the TLS session to that attested key.
[21:07.600 --> 21:10.960]  I guess there are a couple of implementations for Attesta TLS, they work more or less the
[21:10.960 --> 21:11.960]  same.
[21:11.960 --> 21:12.960]  Yeah.
[21:12.960 --> 21:26.280]  I think to the most that I'm familiar with, there is this vulnerability in remote attestation
[21:26.280 --> 21:35.120]  that can be faked by a machine's whole spread and now I wonder if it is possible to fight
[21:35.120 --> 21:43.920]  out from the remote attestation of the whole cluster, any single machine in the cluster
[21:43.920 --> 21:54.480]  may make an attestation and it goes unnoticed or not, all the others are for example useful.
[21:54.480 --> 21:55.880]  Okay, so repeat the question.
[21:55.880 --> 22:03.360]  The question was there is a known vulnerability for attestation in confidential computing
[22:03.360 --> 22:10.360]  and if given this confidential cluster, if from the whole cluster attestation I can refer
[22:10.360 --> 22:12.840]  to if one of the nodes is faking attestation.
[22:12.840 --> 22:19.880]  I have to say there were several vulnerabilities in several of the CC technologies over time,
[22:19.880 --> 22:23.720]  I'm not aware of, no with what vulnerability you're referring to.
[22:23.720 --> 22:50.320]  Okay, so the way the cluster attestation works is you give the, let's say the first
[22:50.320 --> 22:54.240]  node, it has a known configuration, it will attest all other nodes based on this known
[22:54.240 --> 22:57.160]  attestation.
[22:57.160 --> 23:02.960]  If one node would be able to perfectly fake that attestation, you would not know from
[23:02.960 --> 23:08.680]  an outside, from a cluster attestation perspective which node this would be.
[23:08.680 --> 23:25.240]  But yeah, I guess that's what you can say.
[23:25.240 --> 23:33.480]  It is super simple but it is big TCB, do you have any plans to reduce the TCB?
[23:33.480 --> 23:41.200]  Yeah, we try to, as I said, this is a trade-off, yes, it's a much larger TCB than SGX, much
[23:41.200 --> 23:43.680]  larger TCB even than confidential containers.
[23:43.680 --> 23:47.840]  We of course will try to make it as minimal as possible.
[23:47.840 --> 23:53.120]  Biggest leverage is of course the node OS and everything we can do inside there, yeah,
[23:53.120 --> 23:55.120]  we'll definitely try to improve there.
[23:55.120 --> 23:56.120]  Yes?
[23:56.120 --> 24:05.120]  So you mentioned that there's some firmware at the beginning of the food processes that
[24:05.120 --> 24:06.120]  firmware provided by you or by the provider?
[24:06.120 --> 24:07.120]  Very good question.
[24:07.120 --> 24:08.120]  Oh, sorry.
[24:08.120 --> 24:15.360]  Yeah, the question is, part of the confidential VMs is the first component that's booted
[24:15.360 --> 24:16.360]  is the firmware.
[24:16.360 --> 24:18.520]  Do we have control of the firmware?
[24:18.520 --> 24:23.760]  Ideally we would have, but what's provided by the cloud providers right now is Azure has
[24:23.760 --> 24:25.840]  something in preview that allows you to do that.
[24:25.840 --> 24:30.280]  It's not general available and GCP does not allow you that.
[24:30.280 --> 24:36.360]  So the firmware for at least GCP or Azure is completely controlled by them.
[24:36.360 --> 24:43.000]  On OpenStack with QM or KVM, you can potentially fully control the firmware, yeah.
[24:43.000 --> 24:44.000]  Yes, next question.
[24:44.000 --> 24:48.880]  That doesn't create a huge trust problem because you have to trust the firmware to be secure.
[24:48.880 --> 24:53.640]  I mean, this is, of course, does this create a trust problem is the question.
[24:53.640 --> 24:54.640]  Yeah.
[24:54.640 --> 24:57.240]  I mean, this is a controversy, I fully agree with you.
[24:57.240 --> 24:58.880]  This is not how we would like it.
[24:58.880 --> 25:28.280]  This is just the best we can have.