[00:00.000 --> 00:26.900] So, we're probably shifting gears a bit from all this deeply technical intimidating talks [00:26.900 --> 00:32.900] and kind of connect to the first talk of this track, basically. [00:32.900 --> 00:40.900] I found this really interesting idea of kind of emulating the let's encrypt moment for confidential computing [00:40.900 --> 00:43.900] because it is more or less what we are doing. [00:43.900 --> 00:47.900] So, I'm a software engineer in Microsoft's KINFORG team. [00:47.900 --> 00:55.900] We do a lot of exciting stuff in terms of open source, Linux, eBPF, and containers in Kubernetes [00:55.900 --> 01:03.900] and my team is also involved in exploring ways to integrate existing confidential computing technology, [01:03.900 --> 01:14.900] all that we heard in the talks before with existing ways to deploy containers. [01:14.900 --> 01:24.900] So, this talk is kind of, yeah, we're on a tight schedule so I have to really do cut a few ends short [01:24.900 --> 01:27.900] and probably will be a bit hand wavy. [01:27.900 --> 01:29.900] So, sorry about that. [01:29.900 --> 01:34.900] Please reach out if you want to get details about something. [01:34.900 --> 01:40.900] So, obviously, it's also not a comprehensive coverage over all the confidential containers. [01:40.900 --> 01:51.900] Realm, it's quite wide and covers a lot of areas so I'm focusing on a few things that we are looking at at the moment [01:51.900 --> 01:58.900] and also maybe the point in the talks will not age very well because things are very much evolving [01:58.900 --> 02:08.900] and whatever is mentioned later could be in a few months, could be already like in another stage. [02:08.900 --> 02:15.900] But the idea basically is to provide like pointers for people who want to get involved in this [02:15.900 --> 02:20.900] because from my perspective it's very exciting, it's something that's like a very practical problem you can solve [02:20.900 --> 02:27.900] and there's a lot of open questions that are really accessible I think also for people [02:27.900 --> 02:33.900] without a very deep technical background in confidential computing. [02:33.900 --> 02:40.900] And eventually we have to define cloud native so establish some terminology but we cannot go very deep here [02:40.900 --> 02:51.900] so essentially it's a bit of a buzzword but you will find that it's more or less an ecosystem of practices, tools, interfaces, [02:51.900 --> 02:59.900] APIs that more or less aim to ease the deployment and management of applications on cloud platforms. [02:59.900 --> 03:04.900] And that can be infrastructure as a service, it can be I don't know functions as a service [03:04.900 --> 03:11.900] but in most cases containers are very prominent in this space and this is also what we are focusing on. [03:11.900 --> 03:24.900] And Kubernetes in this space among other competitors has been adopted as a go-to solution really for container orchestration and management. [03:25.900 --> 03:29.900] And quickly introduce Kubernetes in like two lines. [03:29.900 --> 03:38.900] So Kubernetes is a container orchestrator, management, API, abstraction layer. [03:38.900 --> 03:43.900] I would say that it's not trivial to host and operate. [03:43.900 --> 03:56.900] So it's very popular offering by cloud service providers to offer some hosted solution for Kubernetes [03:56.900 --> 04:02.900] and provide the developers, engineers with some API layers. [04:02.900 --> 04:08.900] And in Kubernetes what we have is maybe a bit unique, you have this notion of POTS [04:08.900 --> 04:13.900] which define like a logical environment, it's like they are isolated, resource constrained, [04:13.900 --> 04:20.900] they also share namespaces, C groups and they are composed of individual containers. [04:20.900 --> 04:27.900] So a node has POTS and in the POTS can be very like co-located containers if you want [04:27.900 --> 04:35.900] and this is an abstraction that is quite useful in confidential computing. [04:35.900 --> 04:42.900] So in general I think this is also for people who work with confidential computing quite common. [04:42.900 --> 04:46.900] There's kind of this trade-off scale where you say like you have a small TCB surface. [04:46.900 --> 04:56.900] This means you run enclaves, you have like SDKs and your workloads have to be customized for this. [04:56.900 --> 05:02.900] But this also means you don't have to care for a lot of stuff because the TCB surface is quite small. [05:02.900 --> 05:09.900] And you have like a bigger TCB surface, it's like bare metal VMs, like Kubernetes clusters for example [05:09.900 --> 05:18.900] and those have the convenience of running unmodified workloads, this kind of lift and shift idea. [05:18.900 --> 05:25.900] And if you want the Kubernetes POTS could kind of fit in there somewhere on the scale, [05:25.900 --> 05:27.900] probably not exactly in the middle but somewhere. [05:27.900 --> 05:36.900] So because the idea is that we only have like minimal adjustments to existing workloads that are already running in Kubernetes. [05:36.900 --> 05:48.900] And as mentioned before some workloads are simply locked out of cloud native and public clouds due to compliance issues. [05:49.900 --> 05:59.900] So our approach is basically to ease the adoption of confidential computing by enabling confidentiality with minimal upfront investments [05:59.900 --> 06:10.900] because really only big corporations are able to invest in like self-hosting environments that provide confidentiality. [06:11.900 --> 06:15.900] And as I said like it has been widely adopted by the industry Kubernetes [06:15.900 --> 06:18.900] and Kubernetes provides some abstractions and technologies. [06:18.900 --> 06:27.900] I think you could also even argue that the main value of Kubernetes is not really the tech but the kind of API abstractions that are there. [06:27.900 --> 06:34.900] That really makes you if you move from one place to the other you will be able to adopt quite easily [06:34.900 --> 06:41.900] because there's a lot of like shared solutions these days. [06:41.900 --> 06:46.900] And we can use those abstractions. [06:46.900 --> 06:53.900] For example we look at SEV and TDX they leverage VMs for confidentiality [06:53.900 --> 07:04.900] and we have already an established solution with Qatar containers that run those pod units that we've seen before in virtual machines. [07:04.900 --> 07:11.900] So this is something that we don't have to start from scratch but we can kind of see okay there's something that's been working so far [07:11.900 --> 07:15.900] and maybe we can leverage this. [07:15.900 --> 07:24.900] And so ideally probably you don't see this but ideally the result is that you have a Kubernetes spec [07:24.900 --> 07:29.900] where you just add a property something is confidential, confidential true. [07:29.900 --> 07:40.900] So this is a kind of lift and shift idea that we have like very low friction enabled customers to deploy confidential workloads. [07:41.900 --> 07:51.900] And the problem start when we look at the Kubernetes privileges like they're formed in the pyramid usually. [07:51.900 --> 08:04.900] So the squiggly lines here indicate that it's not clear cut but it's more or less like some parts are owned maybe by the platform engineers [08:04.900 --> 08:10.900] or by the so-called orchestrator some parts are taken over by cloud service providers. [08:10.900 --> 08:15.900] But the idea is that the hierarchy is pretty clear. [08:15.900 --> 08:25.900] So the closer you get to the app users eventually to the service users the less privileged you are. [08:25.900 --> 08:41.900] The platform engineers are really also an in-between the layer that exists like classical operators but on Kubernetes API level. [08:41.900 --> 08:53.900] And the confidentiality permit then drops and it's a bit messy because now we basically have the system that's been built for years [08:53.900 --> 08:58.900] and everything's figured out nicely and now our privilege model doesn't fit anymore. [08:58.900 --> 09:04.900] So from the confidentiality perspective you want to lock out the cloud service provider. [09:04.900 --> 09:10.900] And we also don't want the cluster administrators to mess with the workloads. [09:10.900 --> 09:14.900] Possibly not even the developers. [09:14.900 --> 09:22.900] And it might be really only the app users who have access to confidential data compute. [09:22.900 --> 09:26.900] And this is something we have to deal with. [09:26.900 --> 09:34.900] And this also basically the challenges I think in this model we have to overcome. [09:34.900 --> 09:43.900] I think it's definitely not an exhaustive list but it's like three topics I picked which I like recent months I followed the discussions. [09:43.900 --> 09:46.900] But I also don't think they're like insurmountable. [09:46.900 --> 09:53.900] So there's definitely stuff that we can solve and it's partially nice engineering problems. [09:53.900 --> 09:57.900] Starting with image management. [09:57.900 --> 10:04.900] So the container images usually are managed by the infrastructure. [10:05.900 --> 10:13.900] And this makes a lot of sense because there's also a lot of shared resources. [10:13.900 --> 10:17.900] So containers images are organized in layers. [10:17.900 --> 10:24.900] So instead of pulling an image 10 times it can be cashed on the note. [10:24.900 --> 10:31.900] It can be shared through replications of a single service for example. [10:31.900 --> 10:39.900] And in a trusted execution environment we need verified or encrypted images for our workloads. [10:39.900 --> 10:46.900] And actually there are already OCI facilities for those aspects. [10:46.900 --> 10:49.900] But the problem is they're running on the wrong layer so they're not in the TE. [10:49.900 --> 11:00.900] So if basically the infrastructure part is not part of the TE then we kind of need to drill a hole or move stuff. [11:01.900 --> 11:06.900] To a trusted execution environment. [11:06.900 --> 11:09.900] And so there's basically some approaches. [11:09.900 --> 11:20.900] So I think the pragmatic bandaid to start with which made a lot of sense is just pull everything in encrypted memory in a confidential pod VM. [11:20.900 --> 11:30.900] And this has I think practical merit in the first because you can start with a solution. [11:30.900 --> 11:39.900] But it's very clear that there's downsides to this because you need to provision potentially a big chunks of memory for this. [11:39.900 --> 11:43.900] And also each pod needs to be. [11:43.900 --> 11:47.900] Yeah needs to pull individually their their image layers. [11:47.900 --> 11:57.900] And some of those workloads I know they run pytorch images for example they really like gigabytes big. [11:57.900 --> 12:03.900] And it's not something that you want to pull individually all the time. [12:03.900 --> 12:07.900] We can of course like create encrypted scratch spaces to do this. [12:07.900 --> 12:18.900] So we get rid of the over-provisioning of the memory but the unshared images will still yield pretty bad startup time. [12:18.900 --> 12:30.900] The good news is that there's approaches that we can use today like we can stream encrypted image layers or otherwise chunked up blocks. [12:30.900 --> 12:36.900] So there's different ideas from the house to the confidential pot. [12:36.900 --> 12:40.900] And the technology to do this is also not something that we need to invent from scratch. [12:40.900 --> 12:47.900] It's already there in container D in recent versions of container D. [12:47.900 --> 12:58.900] It's called remote snapshotters that basically do this but it's like not 100 percent there but it's it doesn't meet our requirements fully but I think it's pretty close. [12:58.900 --> 13:05.900] So about the whole image topic I think we can be pretty optimistic. [13:05.900 --> 13:15.900] There's another problem that's about metadata Kubernetes and that's maybe a bit counterintuitive to people who don't know this. [13:15.900 --> 13:20.900] But Kubernetes will like freely transform your specified workloads in multiple ways. [13:20.900 --> 13:23.900] So it will inject mount points environment variables. [13:23.900 --> 13:27.900] It will change your image definitions and this is all by design. [13:27.900 --> 13:42.900] So because like the the cluster operators that I mentioned before that basically make sure that all the workloads are compliant that if you if some engineer deploy some Redis image usually like Redis latest or something. [13:42.900 --> 13:49.900] They would basically on the fly have some admission controller that just rewrites this stuff. [13:49.900 --> 13:55.900] And from our perspective this is not what we want from the confidentiality perspective. [13:55.900 --> 14:07.900] We want to verify what we want to run and we cannot have the the orchestrator or the platform engineers rewriting our specs. [14:07.900 --> 14:16.900] So we want to run exactly what we what we specified and also in terms of for example environment variable injection. [14:16.900 --> 14:18.900] So Kubernetes does this for very good reasons. [14:18.900 --> 14:27.900] For example for service discovery all kinds of information the pot receives from the from the orchestrator. [14:27.900 --> 14:37.900] But it's very problematic if you think of the pot image before you have like a small batch thing that does some caching on Redis locally. [14:37.900 --> 14:46.900] And if the control plane would just inject the Redis host environment variable and forward this to whatever various instance. [14:46.900 --> 14:49.900] This would obviously topple confidentiality. [14:49.900 --> 14:52.900] So this is something we need to deal with. [14:52.900 --> 14:58.900] And this is a bit I think from my perspective a bit messier than the image part. [14:58.900 --> 15:16.900] Because we basically have to or at least I don't see another way than saying OK we have to review the delta between what the user eventually specified or what the user specified and what is eventually being provisioned on the container container D site. [15:16.900 --> 15:27.900] And I think we can then validate whether we're fine with the applied changes in the user specified policies. [15:27.900 --> 15:28.900] But it's very hard. [15:28.900 --> 15:32.900] I think in some cases to write those policies are very complex. [15:32.900 --> 15:45.900] So the whole idea that we have a nice UX that says like just confidential true is not working anymore because the users have to write those policies. [15:45.900 --> 15:51.900] And they there's some dynamisms in Kubernetes that are very hard to model. [15:51.900 --> 15:59.900] So we either have to find a way to kind of make this really convenient or we find some other way to tackle this. [15:59.900 --> 16:10.900] I think there's another idea I've recently read about is basically that we are a variation of this idea as more or less it will kind of lock the changes that have been performed. [16:10.900 --> 16:22.900] And then and yeah top approach it from this perspective that we see OK we we don't model what could happen. [16:22.900 --> 16:29.900] But we more or less look at the log file of the changes that are there and see whether we are fine with those changes. [16:29.900 --> 16:38.900] But that's I think the same idea but it's only a variation. [16:38.900 --> 16:44.900] Eventually I think and this is also something that is a bit more challenging. [16:44.900 --> 16:50.900] We have to address the problem that the control plane API is in the host components. [16:50.900 --> 17:00.900] They interact with the Kubernetes ports and it's really like a crucial part of all the tooling of all the of how it developers interact with Kubernetes. [17:01.900 --> 17:11.900] I know exec into their containers for debugging stuff they will get locks and all this is currently going through the control plane. [17:11.900 --> 17:20.900] And basically our task is to cut out this kind of middleman from the user to the workload. [17:20.900 --> 17:24.900] And there's a lot of aspects like observability. [17:24.900 --> 17:32.900] This is a key concept of like cloud native workloads and it's obviously almost a contradiction like confidentiality and observability. [17:32.900 --> 17:40.900] It's very hard to reconcile but it's something that I think is we have to address at some point. [17:40.900 --> 17:51.900] So we kind of need to obscure the locks traces metrics from non trusted parties and those metrics are for example sometimes also used in by the orchestrator directly. [17:51.900 --> 17:57.900] So they would use our garbage collection metrics to perform autoscaling of pots. [17:57.900 --> 18:03.900] So it's really a tricky question how to deal with this. [18:03.900 --> 18:18.900] So we need to deprivilege basically the non trusted parties and prevent them from doing those or from retrieving those metrics but also from executing commands in the scope of a confidential pot. [18:18.900 --> 18:26.900] And I think there's yeah as we see images I think the pragmatic bandaid is probably at the moment locking down those problematic parts. [18:26.900 --> 18:34.900] It's like you cannot exec at the moment maybe retrieving locks is not that easy as it was before. [18:34.900 --> 18:40.900] And obviously that's not something that's practical in the long run for for real workloads. [18:40.900 --> 18:48.900] And there's two solutions to this problem I've stumbled upon recently. [18:48.900 --> 19:00.900] It's more or less like say you split the container management APIs into the infrastructure and trusted parts and operate a kind of shadow trusted control plan that users and tools interact with. [19:00.900 --> 19:13.900] And this is also part of a tee and it would kind of mirror the Kubernetes API the downside I think it's it's from what I have seen. [19:13.900 --> 19:24.900] It's a it's a large effort and I'm also not sure whether it's maintainable in the long run because you basically have to evolve with the Kubernetes APIs all the time. [19:24.900 --> 19:33.900] And I think the other alternative solution I've seen is more viable from my perspective. [19:33.900 --> 19:45.900] It means we haven't kind of an encrypted transport between the privileged user and the container management tactic on the confidential VM for example. [19:45.900 --> 19:51.900] And the downside to this approach is basically that it's also quite an invasive change. [19:51.900 --> 20:04.900] Because you need to extend many power touch many parts of the stack because you have this kind of confidential transport through all these layers means you have to. [20:05.900 --> 20:23.900] Change components in Kubernetes in container D and even like the clients and tools that sit on top of Kubernetes so it's also not an easy thing to do. [20:23.900 --> 20:27.900] Yeah, I think that's the three points I wanted to mention. [20:27.900 --> 20:35.900] In summary, I think confidential computing and cloud native containers are a good match. [20:35.900 --> 20:40.900] From my perspective, it could really boost the adoption of confidential computing. [20:40.900 --> 20:49.900] There's definitely some hairy questions we need to figure out to make this work in practice. [20:49.900 --> 20:55.900] But from what I'm seeing like there's a very engaged community and yeah, it's very exciting. [20:55.900 --> 21:02.900] So if you want to chime in, it's confidential containers or on GitHub. [21:02.900 --> 21:04.900] There's meetings. [21:04.900 --> 21:06.900] There's slack. [21:06.900 --> 21:10.900] And I think that's it. [21:17.900 --> 21:20.900] We have some around 10 minutes for questions. [21:21.900 --> 21:34.900] You mentioned the control plane issue and the API between the control plane and the T is not practical in the long run. [21:34.900 --> 21:37.900] What do you mean by that? [21:37.900 --> 21:39.900] No, no, locking it out is not practical. [21:39.900 --> 21:46.900] So if you basically say you're not able to use a full API in confidential context. [21:47.900 --> 21:53.900] So if you basically right now you can lock down the controversial parts of the API. [21:53.900 --> 21:55.900] Because it's very hard to untangle those things. [21:55.900 --> 22:01.900] Some things are like, you know, when the APIs weren't conceived, you didn't think like exec would be a problem, [22:01.900 --> 22:03.900] but create wouldn't be a problem. [22:03.900 --> 22:11.900] So at the moment, I think we can do this on the container D level or the Cata agent level where you can basically say, [22:11.900 --> 22:13.900] okay, there's a few things we just don't support. [22:13.900 --> 22:16.900] But I think this is not sustainable in the long run. [22:16.900 --> 22:21.900] So you're saying that you're not saying the architecture is not practical. [22:21.900 --> 22:26.900] You're saying that the fact that you're shutting the API off is not a long term solution. [22:26.900 --> 22:27.900] Exactly. [22:27.900 --> 22:31.900] But I mean, if you want to basically start with something that is definitely, [22:31.900 --> 22:39.900] like the image pulling on the confidential VM, this is, I think, makes sense if you want to start with this. [22:44.900 --> 22:49.900] What do you think about the metadata problem and then the last problem? [22:49.900 --> 22:54.900] Having this trusted control plane also as a solution for the metadata problem, [22:54.900 --> 22:59.900] where you're like a trusted controller, a trusted part of the API server that can give your descriptions, [22:59.900 --> 23:01.900] and it will enforce them to the container? [23:01.900 --> 23:04.900] I mean, that's obviously, pardon you. [23:04.900 --> 23:15.900] So the question is whether basically the, if you move parts of the control plane into the TAE, [23:15.900 --> 23:18.900] whether you basically get around a lot of those problems. [23:18.900 --> 23:23.900] And that's absolutely true because basically this self-made problems to a huge degree. [23:23.900 --> 23:26.900] So if you just take the whole thing and put it in a TAE, [23:26.900 --> 23:32.900] then most of the things that were there, maybe even all of the things there, [23:32.900 --> 23:34.900] aren't a problem anymore suddenly. [23:34.900 --> 23:42.900] But this is, as I said, starting from the notion of existing users [23:42.900 --> 23:48.900] who basically use existing hosted Kubernetes offerings. [23:48.900 --> 23:52.900] How do you basically migrate those users to confidential? [23:52.900 --> 23:54.900] You would have to do those things. [23:54.900 --> 23:59.900] And so I'm not even arguing that this is maybe the best solution in the long run. [23:59.900 --> 24:05.900] But if you want to do this, then you have to kind of overcome those issues. [24:05.900 --> 24:07.900] We have two online questions. [24:07.900 --> 24:10.900] We have two online questions, and I'm going to give you so we can read it. [24:10.900 --> 24:15.900] Okay. [24:15.900 --> 24:17.900] Okay, there's a question. [24:17.900 --> 24:22.900] Are there any new challenges related to attestation protocols in containers [24:22.900 --> 24:28.900] compared to already existing attestation mechanism in TDX, SVV, S&P, etc.? [24:28.900 --> 24:34.900] From my perspective, I think this is not something that is necessarily conflicting [24:34.900 --> 24:37.900] with these confidential containers. [24:37.900 --> 24:43.900] So it's more or less, we also have to basically follow the same attestation principles [24:43.900 --> 24:47.900] as non-containerized workloads. [24:47.900 --> 24:49.900] We're just moving on top. [24:49.900 --> 24:54.900] Yeah, you're just moving it on a different layer. [24:54.900 --> 25:03.900] But I think there's no fundamental difference or problem that is specific to containers. [25:03.900 --> 25:10.900] There's another question about the use of a proxy being considered. [25:10.900 --> 25:19.900] I think it's a bit too broad for me to understand where a proxy would sit. [25:19.900 --> 25:26.900] Yeah. [25:26.900 --> 25:30.900] Yeah, I mean, this is what I meant by having this kind of transport. [25:30.900 --> 25:37.900] But yeah, I think the proxy pretty much you need to then start also at the client to tweak things. [25:37.900 --> 25:42.900] Maybe you have a distinct cube config for confidential ports. [25:42.900 --> 25:47.900] I didn't really look deep into this, but I think there's definitely changes. [25:47.900 --> 25:51.900] You need to do also when you employ a proxy. [25:51.900 --> 26:00.900] And one question or comment from my side is, well, there are a lot of challenges in order to have this done [26:00.900 --> 26:01.900] in the first structure side. [26:01.900 --> 26:06.900] So you mentioned about how we pull the pods and whether we can share those or not. [26:06.900 --> 26:08.900] The image, sorry. [26:08.900 --> 26:14.900] That's a problem that affects mostly the people providing the service. [26:14.900 --> 26:21.900] But here, in my mind, the biggest problem we have is maintaining the availability, [26:21.900 --> 26:24.900] the unobserved availability for the end users. [26:24.900 --> 26:29.900] And that's something that, well, we will have to think together on how to solve these. [26:29.900 --> 26:30.900] Yeah. [26:30.900 --> 26:32.900] I mean, the priorities, I think I agree. [26:32.900 --> 26:38.900] Sometimes I think are different, put it this way, because sometimes things are just also KPI driven. [26:38.900 --> 26:42.900] So you say like, we have this solution and it starts like in three seconds. [26:42.900 --> 26:44.900] And if we do this, then it doesn't work. [26:44.900 --> 26:47.900] So because then we regressed in some metric. [26:47.900 --> 26:54.900] But this, I think a product question from my personal view, I also don't think that the image pulling time, [26:54.900 --> 26:56.900] the startup time is a big issue. [26:56.900 --> 27:03.900] But there's also like, if many of those workloads are machine learning workloads, then it's like this. [27:03.900 --> 27:07.900] A PyTorch image is like 21 gigabyte or something. [27:07.900 --> 27:09.900] It's really crazy. [27:09.900 --> 27:12.900] And I understand that there's a concern. [27:12.900 --> 27:16.900] And startup is not like a problem for Kubernetes itself. [27:16.900 --> 27:18.900] We have to wait a lot with Maze. [27:18.900 --> 27:20.900] But for functional services, it is. [27:20.900 --> 27:21.900] Yeah. [27:21.900 --> 27:24.900] And this is one of the main cases that people are looking for. [27:24.900 --> 27:25.900] So, yeah. [27:25.900 --> 27:30.900] It's understandable trying to mize that side as well. [27:30.900 --> 27:31.900] Thank you, Mades. [27:31.900 --> 27:32.900] Thanks a lot.