[00:00.000 --> 00:21.360]  Hello everybody, this is my honor to introduce in my recent work about enabling NIDA's image
[00:21.360 --> 00:25.920]  service for confidential containers.
[00:25.920 --> 00:33.400]  Let me introduce myself first. I am Gary Niel from Antibaba Cloud. Currently I am working
[00:33.400 --> 00:42.440]  in the OS team to enable NIDA's operating system for cloud workloads. I am a non-standing
[00:42.440 --> 00:52.240]  NIDA's cloud hacker and has contributed much to the NIDA's kernel. In the last few years,
[00:52.240 --> 01:01.560]  I am also interested in cloud-related technologies such as micro-oVM, container runtime, container
[01:01.560 --> 01:10.160]  image management, and I have joined several open source projects such as the Cata containers
[01:10.160 --> 01:15.760]  project, confidential containers project, the NIDA's image service project, and the
[01:15.760 --> 01:24.760]  masterman project. I will go over three topics. First, I will explain the special requirement
[01:24.760 --> 01:33.240]  of image management for confidential containers, the current available technologies, and the
[01:33.240 --> 01:39.360]  challenging we are still facing. Then I will give a brief introduction about the NIDA's
[01:39.360 --> 01:50.920]  image project. Its design, its feature, and its current status. Last, I will give my ideas
[01:50.920 --> 01:58.800]  to enhance NIDA's image service for confidential containers to improve the image loading process
[01:58.800 --> 02:07.000]  for confidential containers. Project COCO, or the Confidential Container Project, aims
[02:07.000 --> 02:13.260]  to protect confidentiality and the integrity of a container workload by using hardware
[02:13.260 --> 02:22.400]  TEs. A way to protect a container application is to adopt the Cata container architecture,
[02:22.400 --> 02:32.720]  which is to run a dedicated virtual machine for each port. And we can enhance confidentiality
[02:32.720 --> 02:44.760]  and the integrity of Cata virtual machines with hardware TEs. So to protect an application,
[02:44.760 --> 02:52.760]  we need to protect all the sources accessed by the application, such as CPU, memory, network
[02:52.760 --> 03:01.320]  storage, and external device, such as GPUs. As a container, in addition to those resources
[03:01.320 --> 03:09.440]  accessed by the application, we also need to protect the container image of the workload.
[03:09.440 --> 03:20.840]  So how could we protect container images for confidential containers? So why do we care
[03:20.840 --> 03:29.560]  about image management for confidential containers? What's the special requirement? Before talking
[03:29.560 --> 03:35.800]  about special requirements, let's go through the current way to managing container images
[03:35.800 --> 03:45.800]  for normal containers. Take a container D as an example. To run a container, container
[03:45.800 --> 03:57.240]  D will first download a raw image blob from registry and store those blobs to local FS.
[03:57.240 --> 04:04.520]  Once all blobs are ready, container D will call slap shorter to convert those blobs into
[04:04.520 --> 04:15.120]  fast system and prepare root FS for containers. Once the root FS is ready, container D will
[04:15.120 --> 04:23.440]  start a container and the container can access all files and the data inside the container
[04:23.440 --> 04:35.960]  image. Here, we can say that the raw image blob and the mounted fast system are available
[04:35.960 --> 04:45.560]  on host sites, which expose special challenge to confidential containers because we need
[04:45.560 --> 04:57.840]  to protect confidentiality and integrity of images for containers. Let's move on to image
[04:57.840 --> 05:07.920]  management for confidential containers. To summary, confidential image management will
[05:07.920 --> 05:18.440]  face three special requirements, which are confidentiality, integrity, and efficiency.
[05:18.440 --> 05:27.000]  To ensure confidentiality, all image containers should be encrypted, both on registry and
[05:27.000 --> 05:39.200]  on local host. So the image container can be kept private. Second, the image management
[05:39.200 --> 05:50.440]  must be moved from host inside into guest. Because if we store image container and mounted
[05:50.440 --> 05:59.440]  fast system on host, the container will be available to host, which breaks confidentiality.
[05:59.440 --> 06:09.680]  Even worse, host can make changes to those images and break the integrity of container
[06:09.680 --> 06:22.480]  images. By encryption and moving image management inside a guest, we can ensure confidentiality
[06:22.480 --> 06:35.520]  and integrity of images. But we will face another new challenges. With traditional image management,
[06:35.520 --> 06:43.600]  each blob and fast system are mounted on host, which can be reused for different container
[06:43.600 --> 06:58.480]  instances and restart. But by moving image management inside into guest, we need to download and
[06:58.480 --> 07:06.440]  prepare images for each container instance. In other words, container images can't be
[07:06.440 --> 07:18.720]  reused for different container instances, which will bring bigger costs, such as high
[07:18.720 --> 07:36.520]  pressure on registry, slow container startup time, and heavy IO requests on local device.
[07:36.520 --> 07:44.000]  So how could we achieve both confidentiality, integrity, and efficiency for confidential
[07:44.000 --> 07:52.320]  containers? There are some existing technologies for confidential containers. The OSI Cript
[07:52.320 --> 08:02.480]  project provides a way to encrypt the whole images and the cosine project provides a way
[08:02.480 --> 08:11.400]  to ensure the integrity of container images. And the confidential container community also
[08:11.400 --> 08:20.560]  invented some new technologies to move image management from host into guest. We modified
[08:20.560 --> 08:28.640]  the container D and Cata container and introduced a new component named imageRS. These three
[08:28.640 --> 08:40.440]  components help us to management container images inside a guest. So we have technologies
[08:40.440 --> 08:47.400]  to ensure container image confidentiality and integrity, but we are still facing the
[08:47.400 --> 08:59.160]  challenge of efficiency. How could we improve efficiency for image management? The latest
[08:59.160 --> 09:05.120]  image management service project provides an interesting way to achieve efficiency for
[09:05.120 --> 09:19.800]  confidential containers. What's the latest image service? The latest image project provides
[09:19.800 --> 09:28.640]  a framework to provide image management service for containers. The following picture is a
[09:28.640 --> 09:37.800]  co-architecture of the latest project. It has been split into build, ship, and run stages.
[09:37.800 --> 09:47.440]  This project has different aspects. First, it defines a read-only file system format
[09:47.440 --> 09:53.000]  with plenty features such as laser loading, data de-definification, and compatible with
[09:53.000 --> 10:04.320]  OSI-V1 images. And we are also adding encryption to the image format. Second, it's a read-only
[10:04.320 --> 10:14.360]  file system for containers, AI models, and software packages. It's very flexible to access
[10:14.360 --> 10:25.160]  the latest image. We provide different interfaces such as fields on Linux and Mac OS, water
[10:25.160 --> 10:35.360]  IFS for virtual machines, and ERFs page sharing on Linux. And we are also developing a real-space
[10:35.360 --> 10:47.440]  library for application to directly access files from our latest image. Third, we also
[10:47.440 --> 10:55.800]  develop a storage subsystem for loader. We also develop a load-neighbors storage subsystem
[10:55.800 --> 11:04.320]  with P2P, Cache, and data de-definification. We build a load-load content-address storage
[11:04.320 --> 11:16.920]  subsystem to duplicate data among different images. Last, the latest image project has
[11:16.920 --> 11:24.840]  put much effort to get integrated with the ecosystem. There is one more feature we should
[11:24.840 --> 11:33.040]  mention here. The latest latest release provides an OSI-V1 compatible model. We will get more
[11:33.040 --> 11:42.560]  information about the compatible model in later. The core of the latest project is the
[11:42.560 --> 11:49.760]  latest image format. So, let's get an undetected explanation about the latest image format.
[11:49.760 --> 12:00.840]  The way to convert an existing OSI-V1 image into a latest image, as we know, an OSI-V1
[12:00.840 --> 12:08.560]  image contains one manifest and one or more data layers. Each data layer is a binary
[12:08.560 --> 12:18.480]  blob. Actually, the binary blob is a tar stream. Within the tar stream, there are tar headers
[12:18.480 --> 12:30.640]  and file data. To convert an OSI-V1 image layer to latest layer, latest data blob, we first
[12:31.360 --> 12:42.360]  group all tar headers together and translate them into a file system meta data. The file
[12:42.360 --> 12:49.360]  system meta data can be mounted directly by a Fuse server or by the Incola IRFS file system.
[12:50.360 --> 12:57.640]  With this file system meta data, let us can provide full file system view to the workspace.
[12:58.640 --> 13:09.640]  Then, we chunk file data into fixed size and compress the chunk data.
[13:11.640 --> 13:23.640]  At last, we need some information to decompress the compressed chunk data. So, our latest data
[13:23.640 --> 13:32.640]  blob includes three parts. First is meta data, chunk info array, and chunk data.
[13:35.640 --> 13:49.640]  There is one latest data blob for every OSI-V1 image layer. In addition, latest has an extra
[13:49.640 --> 14:00.640]  layer. We call it latest meta data blob. latest meta data blob is generated by merging all
[14:00.640 --> 14:10.640]  file system meta data from all data blobs. In other words, the latest meta data blob is
[14:10.640 --> 14:23.640]  built-time overview file system. With the meta data blob, we do not need to mount each data blob
[14:23.640 --> 14:32.640]  individually. Instead, we directly mount the meta data blob. Thus, we don't need to overlay image
[14:32.640 --> 14:40.640]  layers at a long time. We don't notice that if we care about backward compatibility, we
[14:40.640 --> 14:50.640]  need to both generate OSI-V1 image and the latest image for the same container. That will cause
[14:50.640 --> 15:04.640]  container image data saved twice and waste storage space. To solve this problem, the
[15:04.640 --> 15:16.640]  latest NIDAS provides a new model called NIDAS OSI-V1 compatible model. With this
[15:16.640 --> 15:24.640]  mode, the latest data blob only contains file system meta data and chunk information. It
[15:24.640 --> 15:39.640]  doesn't save chunk data. The OSI image spec version 1.1 provides an OSI reference type.
[15:39.640 --> 15:48.640]  By using the reference type, we can get the data from the original OSI-V1 image. That means
[15:48.640 --> 15:58.640]  for existing OSI-V1 images, we can build an extra NIDAS image to provide lazy loading
[15:58.640 --> 16:10.640]  and other features. The OSI compatible model generates various more NIDAS images, typically
[16:10.640 --> 16:21.640]  about 3 to 5% of the original OSI-V1 images. The OSI-V1 compatible model is very useful
[16:21.640 --> 16:26.640]  for backward compatibility.
[16:30.640 --> 16:41.640]  NIDAS has two modes. One is NIDAS native mode and the other is OSI-V1 compatible mode. Each
[16:41.640 --> 16:50.640]  NIDAS image contains two types of blob, data blob and meta blob. The meta blob contains
[16:50.640 --> 16:57.640]  file system meta data and can provide a full file system view. And the data blob contains
[16:57.640 --> 17:06.640]  file chunks for each layer. The NIDAS project also provides flexible interface to access
[17:06.640 --> 17:18.640]  NIDAS images. It can be accessed by a source of use, URLFS, waterFS, and even through some
[17:18.640 --> 17:29.640]  URL space library. For example, the NIDAS image is URLFS compatible. Let's look at the way
[17:29.640 --> 17:37.640]  for URLFS to make use of NIDAS images. The URLFS will directly mount a NIDAS meta
[17:37.640 --> 17:47.640]  data blob and provide a full file system view. The application can work the file system tree.
[17:47.640 --> 17:53.640]  When the application tries to read the data from a file and the file data is not ready,
[17:53.640 --> 18:00.640]  URLFS will notify the FS catch and FS catch will send a request to NIDASD and MSD will
[18:00.640 --> 18:06.640]  fetch the data from the remote registry. And when the data is ready, NIDAS will notify
[18:06.640 --> 18:15.640]  FS catch and notify URLFS. Eventually, the data will be sent back to the application.
[18:15.640 --> 18:24.640]  As image service, help to improve the efficiency of confidential containers. There are several
[18:24.640 --> 18:34.640]  enhancements needed for NIDAS images to support confidential containers. First, we need to
[18:34.640 --> 18:44.640]  add data encryption to NIDAS image format. We use a hybrid mode to protect NIDAS image.
[18:44.640 --> 18:55.640]  First, we will use OSI craps to protect the NIDAS meta data blob. And the meta data blob
[18:55.640 --> 19:07.640]  contains case to describe data from data blobs. So the data blobs are protected by NIDAS.
[19:07.640 --> 19:16.640]  By that way, we can support both data encryption and meta loading at the same time.
[19:16.640 --> 19:31.640]  For data integrity, traditionally, the integrity of data blobs or images are verified at a
[19:31.640 --> 19:38.640]  download time. And there is no mechanism to ensure data integrity at a run time. NIDAS
[19:38.640 --> 19:47.640]  adds a special attribute to the image management to verify the integrity of data checks at a
[19:47.640 --> 19:59.640]  run time. So, like encryption, we will combine cosine and NIDAS to protect the integrity of
[19:59.640 --> 20:07.640]  the whole image. First, we will use cosine to protect the integrity of manifest and NIDAS
[20:07.640 --> 20:16.640]  meta blob. And the meta data blob contains digest of the data blob. And there is a
[20:16.640 --> 20:26.640]  monetary to usually ensure the data integrity of each data chunk. So the data blob is protected
[20:26.640 --> 20:36.640]  by NIDAS again. With the enhancement of encryption and data integrity verification, we
[20:36.640 --> 20:45.640]  can support laser loading and image cards for confidential containers. So, we can fetch image
[20:45.640 --> 20:54.640]  data from remote registry or from remote node through P2P, or we can fetch the image data
[20:54.640 --> 21:03.640]  from host, from data cache on host through what type of interface or what type of block interface.
[21:03.640 --> 21:13.640]  And we also support different modes to access encrypted images. It can be accessed through
[21:13.640 --> 21:23.640]  LADASD and fuels, or it can also be accessed through LADASD and URFS. And we are working
[21:23.640 --> 21:34.640]  on researching to enable URFS to directly access LADAS images, but that is still in
[21:34.640 --> 21:45.640]  the early stage. We are still working on that direction. That is our development plan. The
[21:45.640 --> 21:53.640]  first stage is to integrate LADAS image service with the image IIS create. After the first
[21:53.640 --> 22:03.640]  stage, we only provide the laser loading capability and do not include data caching. The
[22:03.640 --> 22:13.640]  next step is to add data caching to LADASD. By that, we can perfect the image data and
[22:13.640 --> 22:23.640]  cache it inside the trusted domain that will greatly improve the performance and reliability.
[22:23.640 --> 22:33.640]  And as I mentioned just now, we are still investigating to enhance the URFS to directly access
[22:33.640 --> 22:44.640]  LADAS images through what type of block. If we achieve that, it will be very flexible.
[22:44.640 --> 22:58.640]  There is no URFS demon to serve the image. That will be very great. But how to provide
[22:58.640 --> 23:09.640]  image caching and host is out of scope. We won't discuss it here. And there are other ways
[23:09.640 --> 23:19.640]  to provide image caching service, such as we can block-based image caching. For example,
[23:19.640 --> 23:30.640]  we can use code call to image format to provide encrypted image. Then we can use
[23:30.640 --> 23:37.640]  DEM integrity and DEM craft to ensure the confidentiality and integrity.
[23:37.640 --> 23:50.640]  Let us do the same simple, but it is not very inflexible. So we will enable LADAS image
[23:50.640 --> 24:03.640]  service for confidential container first. We are targeting to integrate LADAS image
[24:03.640 --> 24:10.640]  service into confidential containers by end of code 2. If you are interested in the technology
[24:10.640 --> 24:16.640]  or project, please join us. Thank you for listening.
[24:33.640 --> 24:35.640]  .