[00:00.000 --> 00:21.360] Hello everybody, this is my honor to introduce in my recent work about enabling NIDA's image [00:21.360 --> 00:25.920] service for confidential containers. [00:25.920 --> 00:33.400] Let me introduce myself first. I am Gary Niel from Antibaba Cloud. Currently I am working [00:33.400 --> 00:42.440] in the OS team to enable NIDA's operating system for cloud workloads. I am a non-standing [00:42.440 --> 00:52.240] NIDA's cloud hacker and has contributed much to the NIDA's kernel. In the last few years, [00:52.240 --> 01:01.560] I am also interested in cloud-related technologies such as micro-oVM, container runtime, container [01:01.560 --> 01:10.160] image management, and I have joined several open source projects such as the Cata containers [01:10.160 --> 01:15.760] project, confidential containers project, the NIDA's image service project, and the [01:15.760 --> 01:24.760] masterman project. I will go over three topics. First, I will explain the special requirement [01:24.760 --> 01:33.240] of image management for confidential containers, the current available technologies, and the [01:33.240 --> 01:39.360] challenging we are still facing. Then I will give a brief introduction about the NIDA's [01:39.360 --> 01:50.920] image project. Its design, its feature, and its current status. Last, I will give my ideas [01:50.920 --> 01:58.800] to enhance NIDA's image service for confidential containers to improve the image loading process [01:58.800 --> 02:07.000] for confidential containers. Project COCO, or the Confidential Container Project, aims [02:07.000 --> 02:13.260] to protect confidentiality and the integrity of a container workload by using hardware [02:13.260 --> 02:22.400] TEs. A way to protect a container application is to adopt the Cata container architecture, [02:22.400 --> 02:32.720] which is to run a dedicated virtual machine for each port. And we can enhance confidentiality [02:32.720 --> 02:44.760] and the integrity of Cata virtual machines with hardware TEs. So to protect an application, [02:44.760 --> 02:52.760] we need to protect all the sources accessed by the application, such as CPU, memory, network [02:52.760 --> 03:01.320] storage, and external device, such as GPUs. As a container, in addition to those resources [03:01.320 --> 03:09.440] accessed by the application, we also need to protect the container image of the workload. [03:09.440 --> 03:20.840] So how could we protect container images for confidential containers? So why do we care [03:20.840 --> 03:29.560] about image management for confidential containers? What's the special requirement? Before talking [03:29.560 --> 03:35.800] about special requirements, let's go through the current way to managing container images [03:35.800 --> 03:45.800] for normal containers. Take a container D as an example. To run a container, container [03:45.800 --> 03:57.240] D will first download a raw image blob from registry and store those blobs to local FS. [03:57.240 --> 04:04.520] Once all blobs are ready, container D will call slap shorter to convert those blobs into [04:04.520 --> 04:15.120] fast system and prepare root FS for containers. Once the root FS is ready, container D will [04:15.120 --> 04:23.440] start a container and the container can access all files and the data inside the container [04:23.440 --> 04:35.960] image. Here, we can say that the raw image blob and the mounted fast system are available [04:35.960 --> 04:45.560] on host sites, which expose special challenge to confidential containers because we need [04:45.560 --> 04:57.840] to protect confidentiality and integrity of images for containers. Let's move on to image [04:57.840 --> 05:07.920] management for confidential containers. To summary, confidential image management will [05:07.920 --> 05:18.440] face three special requirements, which are confidentiality, integrity, and efficiency. [05:18.440 --> 05:27.000] To ensure confidentiality, all image containers should be encrypted, both on registry and [05:27.000 --> 05:39.200] on local host. So the image container can be kept private. Second, the image management [05:39.200 --> 05:50.440] must be moved from host inside into guest. Because if we store image container and mounted [05:50.440 --> 05:59.440] fast system on host, the container will be available to host, which breaks confidentiality. [05:59.440 --> 06:09.680] Even worse, host can make changes to those images and break the integrity of container [06:09.680 --> 06:22.480] images. By encryption and moving image management inside a guest, we can ensure confidentiality [06:22.480 --> 06:35.520] and integrity of images. But we will face another new challenges. With traditional image management, [06:35.520 --> 06:43.600] each blob and fast system are mounted on host, which can be reused for different container [06:43.600 --> 06:58.480] instances and restart. But by moving image management inside into guest, we need to download and [06:58.480 --> 07:06.440] prepare images for each container instance. In other words, container images can't be [07:06.440 --> 07:18.720] reused for different container instances, which will bring bigger costs, such as high [07:18.720 --> 07:36.520] pressure on registry, slow container startup time, and heavy IO requests on local device. [07:36.520 --> 07:44.000] So how could we achieve both confidentiality, integrity, and efficiency for confidential [07:44.000 --> 07:52.320] containers? There are some existing technologies for confidential containers. The OSI Cript [07:52.320 --> 08:02.480] project provides a way to encrypt the whole images and the cosine project provides a way [08:02.480 --> 08:11.400] to ensure the integrity of container images. And the confidential container community also [08:11.400 --> 08:20.560] invented some new technologies to move image management from host into guest. We modified [08:20.560 --> 08:28.640] the container D and Cata container and introduced a new component named imageRS. These three [08:28.640 --> 08:40.440] components help us to management container images inside a guest. So we have technologies [08:40.440 --> 08:47.400] to ensure container image confidentiality and integrity, but we are still facing the [08:47.400 --> 08:59.160] challenge of efficiency. How could we improve efficiency for image management? The latest [08:59.160 --> 09:05.120] image management service project provides an interesting way to achieve efficiency for [09:05.120 --> 09:19.800] confidential containers. What's the latest image service? The latest image project provides [09:19.800 --> 09:28.640] a framework to provide image management service for containers. The following picture is a [09:28.640 --> 09:37.800] co-architecture of the latest project. It has been split into build, ship, and run stages. [09:37.800 --> 09:47.440] This project has different aspects. First, it defines a read-only file system format [09:47.440 --> 09:53.000] with plenty features such as laser loading, data de-definification, and compatible with [09:53.000 --> 10:04.320] OSI-V1 images. And we are also adding encryption to the image format. Second, it's a read-only [10:04.320 --> 10:14.360] file system for containers, AI models, and software packages. It's very flexible to access [10:14.360 --> 10:25.160] the latest image. We provide different interfaces such as fields on Linux and Mac OS, water [10:25.160 --> 10:35.360] IFS for virtual machines, and ERFs page sharing on Linux. And we are also developing a real-space [10:35.360 --> 10:47.440] library for application to directly access files from our latest image. Third, we also [10:47.440 --> 10:55.800] develop a storage subsystem for loader. We also develop a load-neighbors storage subsystem [10:55.800 --> 11:04.320] with P2P, Cache, and data de-definification. We build a load-load content-address storage [11:04.320 --> 11:16.920] subsystem to duplicate data among different images. Last, the latest image project has [11:16.920 --> 11:24.840] put much effort to get integrated with the ecosystem. There is one more feature we should [11:24.840 --> 11:33.040] mention here. The latest latest release provides an OSI-V1 compatible model. We will get more [11:33.040 --> 11:42.560] information about the compatible model in later. The core of the latest project is the [11:42.560 --> 11:49.760] latest image format. So, let's get an undetected explanation about the latest image format. [11:49.760 --> 12:00.840] The way to convert an existing OSI-V1 image into a latest image, as we know, an OSI-V1 [12:00.840 --> 12:08.560] image contains one manifest and one or more data layers. Each data layer is a binary [12:08.560 --> 12:18.480] blob. Actually, the binary blob is a tar stream. Within the tar stream, there are tar headers [12:18.480 --> 12:30.640] and file data. To convert an OSI-V1 image layer to latest layer, latest data blob, we first [12:31.360 --> 12:42.360] group all tar headers together and translate them into a file system meta data. The file [12:42.360 --> 12:49.360] system meta data can be mounted directly by a Fuse server or by the Incola IRFS file system. [12:50.360 --> 12:57.640] With this file system meta data, let us can provide full file system view to the workspace. [12:58.640 --> 13:09.640] Then, we chunk file data into fixed size and compress the chunk data. [13:11.640 --> 13:23.640] At last, we need some information to decompress the compressed chunk data. So, our latest data [13:23.640 --> 13:32.640] blob includes three parts. First is meta data, chunk info array, and chunk data. [13:35.640 --> 13:49.640] There is one latest data blob for every OSI-V1 image layer. In addition, latest has an extra [13:49.640 --> 14:00.640] layer. We call it latest meta data blob. latest meta data blob is generated by merging all [14:00.640 --> 14:10.640] file system meta data from all data blobs. In other words, the latest meta data blob is [14:10.640 --> 14:23.640] built-time overview file system. With the meta data blob, we do not need to mount each data blob [14:23.640 --> 14:32.640] individually. Instead, we directly mount the meta data blob. Thus, we don't need to overlay image [14:32.640 --> 14:40.640] layers at a long time. We don't notice that if we care about backward compatibility, we [14:40.640 --> 14:50.640] need to both generate OSI-V1 image and the latest image for the same container. That will cause [14:50.640 --> 15:04.640] container image data saved twice and waste storage space. To solve this problem, the [15:04.640 --> 15:16.640] latest NIDAS provides a new model called NIDAS OSI-V1 compatible model. With this [15:16.640 --> 15:24.640] mode, the latest data blob only contains file system meta data and chunk information. It [15:24.640 --> 15:39.640] doesn't save chunk data. The OSI image spec version 1.1 provides an OSI reference type. [15:39.640 --> 15:48.640] By using the reference type, we can get the data from the original OSI-V1 image. That means [15:48.640 --> 15:58.640] for existing OSI-V1 images, we can build an extra NIDAS image to provide lazy loading [15:58.640 --> 16:10.640] and other features. The OSI compatible model generates various more NIDAS images, typically [16:10.640 --> 16:21.640] about 3 to 5% of the original OSI-V1 images. The OSI-V1 compatible model is very useful [16:21.640 --> 16:26.640] for backward compatibility. [16:30.640 --> 16:41.640] NIDAS has two modes. One is NIDAS native mode and the other is OSI-V1 compatible mode. Each [16:41.640 --> 16:50.640] NIDAS image contains two types of blob, data blob and meta blob. The meta blob contains [16:50.640 --> 16:57.640] file system meta data and can provide a full file system view. And the data blob contains [16:57.640 --> 17:06.640] file chunks for each layer. The NIDAS project also provides flexible interface to access [17:06.640 --> 17:18.640] NIDAS images. It can be accessed by a source of use, URLFS, waterFS, and even through some [17:18.640 --> 17:29.640] URL space library. For example, the NIDAS image is URLFS compatible. Let's look at the way [17:29.640 --> 17:37.640] for URLFS to make use of NIDAS images. The URLFS will directly mount a NIDAS meta [17:37.640 --> 17:47.640] data blob and provide a full file system view. The application can work the file system tree. [17:47.640 --> 17:53.640] When the application tries to read the data from a file and the file data is not ready, [17:53.640 --> 18:00.640] URLFS will notify the FS catch and FS catch will send a request to NIDASD and MSD will [18:00.640 --> 18:06.640] fetch the data from the remote registry. And when the data is ready, NIDAS will notify [18:06.640 --> 18:15.640] FS catch and notify URLFS. Eventually, the data will be sent back to the application. [18:15.640 --> 18:24.640] As image service, help to improve the efficiency of confidential containers. There are several [18:24.640 --> 18:34.640] enhancements needed for NIDAS images to support confidential containers. First, we need to [18:34.640 --> 18:44.640] add data encryption to NIDAS image format. We use a hybrid mode to protect NIDAS image. [18:44.640 --> 18:55.640] First, we will use OSI craps to protect the NIDAS meta data blob. And the meta data blob [18:55.640 --> 19:07.640] contains case to describe data from data blobs. So the data blobs are protected by NIDAS. [19:07.640 --> 19:16.640] By that way, we can support both data encryption and meta loading at the same time. [19:16.640 --> 19:31.640] For data integrity, traditionally, the integrity of data blobs or images are verified at a [19:31.640 --> 19:38.640] download time. And there is no mechanism to ensure data integrity at a run time. NIDAS [19:38.640 --> 19:47.640] adds a special attribute to the image management to verify the integrity of data checks at a [19:47.640 --> 19:59.640] run time. So, like encryption, we will combine cosine and NIDAS to protect the integrity of [19:59.640 --> 20:07.640] the whole image. First, we will use cosine to protect the integrity of manifest and NIDAS [20:07.640 --> 20:16.640] meta blob. And the meta data blob contains digest of the data blob. And there is a [20:16.640 --> 20:26.640] monetary to usually ensure the data integrity of each data chunk. So the data blob is protected [20:26.640 --> 20:36.640] by NIDAS again. With the enhancement of encryption and data integrity verification, we [20:36.640 --> 20:45.640] can support laser loading and image cards for confidential containers. So, we can fetch image [20:45.640 --> 20:54.640] data from remote registry or from remote node through P2P, or we can fetch the image data [20:54.640 --> 21:03.640] from host, from data cache on host through what type of interface or what type of block interface. [21:03.640 --> 21:13.640] And we also support different modes to access encrypted images. It can be accessed through [21:13.640 --> 21:23.640] LADASD and fuels, or it can also be accessed through LADASD and URFS. And we are working [21:23.640 --> 21:34.640] on researching to enable URFS to directly access LADAS images, but that is still in [21:34.640 --> 21:45.640] the early stage. We are still working on that direction. That is our development plan. The [21:45.640 --> 21:53.640] first stage is to integrate LADAS image service with the image IIS create. After the first [21:53.640 --> 22:03.640] stage, we only provide the laser loading capability and do not include data caching. The [22:03.640 --> 22:13.640] next step is to add data caching to LADASD. By that, we can perfect the image data and [22:13.640 --> 22:23.640] cache it inside the trusted domain that will greatly improve the performance and reliability. [22:23.640 --> 22:33.640] And as I mentioned just now, we are still investigating to enhance the URFS to directly access [22:33.640 --> 22:44.640] LADAS images through what type of block. If we achieve that, it will be very flexible. [22:44.640 --> 22:58.640] There is no URFS demon to serve the image. That will be very great. But how to provide [22:58.640 --> 23:09.640] image caching and host is out of scope. We won't discuss it here. And there are other ways [23:09.640 --> 23:19.640] to provide image caching service, such as we can block-based image caching. For example, [23:19.640 --> 23:30.640] we can use code call to image format to provide encrypted image. Then we can use [23:30.640 --> 23:37.640] DEM integrity and DEM craft to ensure the confidentiality and integrity. [23:37.640 --> 23:50.640] Let us do the same simple, but it is not very inflexible. So we will enable LADAS image [23:50.640 --> 24:03.640] service for confidential container first. We are targeting to integrate LADAS image [24:03.640 --> 24:10.640] service into confidential containers by end of code 2. If you are interested in the technology [24:10.640 --> 24:16.640] or project, please join us. Thank you for listening. [24:33.640 --> 24:35.640] .