[00:00.000 --> 00:11.240]  So, hi everyone. I am Merlin and we're going to talk about lightweight Kubernetes operators
[00:11.240 --> 00:19.520]  with WebAssembly. So, basically, it's an attempt to lower the memory and CPU footprint of the
[00:19.520 --> 00:28.160]  Kubernetes control plane. So, I am Merlin. You can also say it in Dutch, Merlin. And
[00:28.160 --> 00:34.520]  I am a researcher at iMac and I teach at Gantt University. I'm also part of the Ubuntu
[00:34.520 --> 00:40.480]  Community Council. But right now, I'm here to talk about my research, which is service
[00:40.480 --> 00:46.880]  orchestration in the cloud and in the edge. And so, it's specifically the edge part of
[00:46.880 --> 00:52.520]  this research. Edge computing is becoming more and more popular. More and more people
[00:52.520 --> 00:59.280]  want to run their applications closer to end users on devices inside of users' homes,
[00:59.280 --> 01:05.400]  for example. And as a result, you have a lot of these people who are coming from a background
[01:05.400 --> 01:10.320]  of developing cloud applications and who now suddenly want to develop applications that
[01:10.320 --> 01:16.960]  run on devices, which are very low-powered. And they really like the development experience
[01:16.960 --> 01:22.440]  of the cloud. They like all the tools. They like the cloud-native experience with tools
[01:22.440 --> 01:28.680]  like Kubernetes, for example. But as most of you might know, Kubernetes isn't really
[01:28.680 --> 01:36.600]  a great fit for the edge. Kubernetes is incredibly resource-hungry. It really likes to gobble
[01:36.600 --> 01:43.760]  up RAM. It really likes to block all your CPUs. And there's a lot of components inside
[01:43.760 --> 01:50.320]  of the Kubernetes control plane that do this. Part of it is the kubelet that runs on every
[01:50.320 --> 01:57.080]  worker machine. Part of it is the container run times themselves or the API server. But
[01:57.080 --> 02:11.440]  what I'm going to talk about in this session, I think I have no idea why. I still have batteries,
[02:11.440 --> 02:20.040]  so I'm going to talk about operators specifically. Operators tend to take a lot of resources,
[02:20.040 --> 02:26.720]  eat up a lot of resources from your Kubernetes cluster. So first of all, operators, these
[02:26.720 --> 02:33.200]  are basically plugins to the Kubernetes control plane, which add additional functionality
[02:33.200 --> 02:39.480]  to the Kubernetes API. For example, it could add a resource to deploy and manage a MySQL
[02:39.480 --> 02:48.320]  cluster or it could add a resource to deploy and manage a SEF cluster, for example. And
[02:48.320 --> 02:55.280]  these operators, they are also really resource-hungry. And this is part of it is because they are
[02:55.280 --> 03:00.920]  long-running processes. So these processes, they see something change in your Kubernetes
[03:00.920 --> 03:05.200]  cluster. They want to do something with it and then write those changes back to the API
[03:05.200 --> 03:10.480]  server in order to manage the applications. But after that writing is done, these processes,
[03:10.480 --> 03:16.520]  they keep running because they keep listening for events from the Kubernetes API or even
[03:16.520 --> 03:22.480]  sometimes manually watching if some resource has changed. And so even if they're doing
[03:22.480 --> 03:27.880]  nothing, they're still running. A lot of them are written in Golang. And Golang really
[03:27.880 --> 03:34.080]  likes memory. They are running inside of containers. Most of them are running inside of separate
[03:34.080 --> 03:40.880]  containers. And they're basically sitting in RAM doing nothing, eating up that RAM.
[03:40.880 --> 03:45.640]  And so this is an issue if you want to run Kubernetes in the edge on devices which have
[03:45.640 --> 03:55.160]  like 512 megabytes of RAM. These operators are basically unusable in situations like
[03:55.160 --> 04:02.480]  that. So how could we solve this? One of the ways that you could solve this is that we
[04:02.480 --> 04:08.200]  think we can solve this is by using WebAssembly and the WebAssembly system interface. And so
[04:08.200 --> 04:14.920]  yes, really, we're trying to lower the footprint of Kubernetes by taking a web technology and
[04:14.920 --> 04:21.400]  putting it inside of Kubernetes. If you don't believe me, this is a tweet from one of the
[04:21.400 --> 04:27.040]  co-founders of Docker who basically said like if WebAssembly and the WebAssembly system interface
[04:27.040 --> 04:35.640]  would have existed in 2008, they wouldn't have needed to create Docker. It's a very interesting
[04:35.640 --> 04:42.840]  technology which we think is a very good fit to solve this issue in Kubernetes. So what
[04:42.840 --> 04:50.680]  is WebAssembly created originally for the browser? It's basically a binary code format. You compile
[04:50.680 --> 04:57.640]  your applications to WebAssembly instead of compiling them to x86 or to ARM. And then this
[04:57.640 --> 05:04.200]  code runs inside of a runtime. You could call it a very lightweight virtual machine. It runs
[05:04.200 --> 05:10.120]  in your browser, it runs in the Node.js runtime, but there's also a whole bunch of new purpose
[05:10.120 --> 05:17.760]  built, very lightweight runtimes such as wasm time, the one that we're using right now. And
[05:17.760 --> 05:24.120]  the WebAssembly system interface is basically a syscall interface. So WebAssembly is your binary,
[05:24.120 --> 05:29.680]  but it doesn't have access to anything. And then the system interface is a syscall interface. So
[05:29.680 --> 05:35.400]  that's an interface that it uses to open files, open sockets, start new threads and stuff like
[05:35.400 --> 05:41.200]  that. And so if you combine these two, you basically have a very lightweight, super fast
[05:41.200 --> 05:51.080]  sandbox. And so the result of running these operators inside of WebAssembly containers is
[05:51.080 --> 05:59.320]  that they use a lot less RAM. So here on this slide at the top, you see 100 operators running as
[05:59.320 --> 06:06.920]  Docker containers. Then you have 100 operators running as WebAssembly containers and then 100
[06:06.920 --> 06:13.560]  running just on bare metal. So we're not reaching the performance of bare metal. There's still
[06:13.560 --> 06:20.000]  some overhead. However, we're compared to the Docker containers like we're getting a lot closer
[06:20.000 --> 06:29.480]  than that. As an advantage that we didn't see coming initially, but they also have a lot less
[06:29.480 --> 06:36.480]  latency. They run a lot quicker. This also shows the difference between Golang operators and Rust
[06:36.480 --> 06:42.440]  operators. So obviously, Rust will have a lot less latency and a lot less latency distribution
[06:42.440 --> 06:48.040]  because it's not a garbage collected language. However, we were surprised to see that running
[06:48.040 --> 06:54.960]  them inside of WebAssembly gave them even better, even more consistent latency. So how did we do
[06:54.960 --> 07:04.040]  this? We basically work with a client server model or like a parent operator and a child operator.
[07:04.040 --> 07:10.640]  The parent operator, it is a WebAssembly runtime with a bunch of additions to it in order to
[07:10.640 --> 07:19.320]  support running operators inside of that runtime. And it watches the Kubernetes resources in the
[07:19.320 --> 07:25.520]  name of the operators running inside of it. So the operators don't have to keep running to watch
[07:25.520 --> 07:31.120]  it. They can just shut down when there's nothing to do. And the parent operator will call them
[07:31.120 --> 07:37.840]  once there is a change to process. The child operators, those are where the actual operators
[07:37.840 --> 07:46.320]  run inside. And the interesting part is that they are just regular operators compiled to WebAssembly
[07:46.320 --> 07:55.560]  using a patched version of the Kubernetes SDK. So in the future, this will probably make it
[07:55.560 --> 08:00.760]  possible to just take a regular Kubernetes operator, compile it to WebAssembly, and then use it in
[08:00.760 --> 08:08.960]  this system. Right now, we only support Rust because Rust support for WebAssembly is very good,
[08:08.960 --> 08:18.080]  Golang support for WebAssembly is iffy. And we have a patched version of Kube RS, a Kubernetes SDK,
[08:18.080 --> 08:30.360]  to then contact the parent operator instead of contacting the Kubernetes API itself. So how
[08:30.360 --> 08:38.240]  does this loading and unloading work? This is the WebAssembly engine. This is basically just wasn't
[08:38.240 --> 08:45.880]  time, the WebAssembly runtime. And in here is your client operator, your child operator is running.
[08:45.880 --> 08:52.200]  Once the child operator wants to contact the Kubernetes API server, it does a syscall. We
[08:52.200 --> 08:58.880]  extended the WebAssembly system interface to add a few syscalls to support the scenario. And this
[08:58.880 --> 09:05.680]  syscall goes through to the parent operator and the parent operator is the one who actually contacts
[09:05.680 --> 09:13.040]  the Kubernetes API. Once these calls are finished, the parent operator, it contacts the child
[09:13.040 --> 09:21.200]  operator back again in order to give it the result of these calls. And if the child operator is not
[09:21.200 --> 09:26.360]  doing anything, the parent operator shuts down the child operator. And once there changes to
[09:26.360 --> 09:33.720]  process, it starts it up again. And so the results I showed you on the first slides, those results
[09:33.720 --> 09:40.640]  are just not unloading anything. Just running Kubernetes operators inside of WebAssembly. So
[09:40.640 --> 09:50.600]  these results are what you get when you have a worst case scenario for unloading operators when
[09:50.600 --> 09:56.840]  they're not doing anything. And so we see that in a worst case scenario, they still use 50% less
[09:56.840 --> 10:02.680]  RAM because they're constantly being unloaded and then reloaded again once there's changes to
[10:02.680 --> 10:10.840]  process. However, this is obviously at the cost of latency. Even though WebAssembly, it starts
[10:10.840 --> 10:20.240]  incredibly fast. It has latency that just can't be compared to Docker containers for starting
[10:20.240 --> 10:25.920]  applications. There is still some latency to start a WebAssembly application. And so this
[10:25.920 --> 10:33.920]  compounds in the worst case scenario of like 100 operators chaining themselves up to 12 seconds,
[10:33.920 --> 10:43.480]  which is an issue. So what are we doing now? So we have this basic proof of concept to show
[10:43.480 --> 10:49.240]  that this seems to be a very good approach to lower the footprint of the Kubernetes control
[10:49.240 --> 10:56.720]  plane. And we want to do more with this. Currently, we're improving the build tools and we're making
[10:56.720 --> 11:02.320]  more realistic tests. All the tests we did right now were a worst case scenario of operators
[11:02.320 --> 11:08.160]  constantly doing stuff. However, in the real world, most operators don't do anything most of the
[11:08.160 --> 11:14.920]  time. So we're creating more realistic tests to see what these operators, what the performance
[11:14.920 --> 11:22.440]  benefits are for real workloads. We're also working on predictive unloading so that if we know that
[11:22.440 --> 11:28.120]  an operator is going to have to run again in a few milliseconds, we don't unload it because it's
[11:28.120 --> 11:34.640]  better to just keep it running. In the future, we want to work on better support for controllers
[11:34.640 --> 11:41.360]  that wake periodically. So right now, we see that a lot of production controllers actually wake
[11:41.360 --> 11:47.800]  periodically every five seconds or every 20 seconds in order to manually check resources in the
[11:47.800 --> 11:55.040]  Kubernetes API because some of those resources, they can't work with callbacks. So we are trying
[11:55.040 --> 12:00.480]  to figure out a way to actually put that functionality into the host operator itself so that
[12:00.480 --> 12:07.680]  even when you're watching resources that don't support event-based APIs, the operator is still
[12:07.680 --> 12:13.800]  sleeping as long as there's nothing to process. And we're also really interested in upstreaming
[12:13.800 --> 12:19.240]  and standardizing this. We have patches for Kube RS. We have an extension for the WebAssembly
[12:19.240 --> 12:25.200]  system interface. It would be very interesting to see if there's people in the ecosystem who are
[12:25.200 --> 12:32.880]  interested in this and support for Golang, although this will probably not be work that we're doing,
[12:32.880 --> 12:40.680]  we'll just wait until Golang is better supported in WebAssembly. So I have to thank the developers.
[12:40.680 --> 12:48.440]  Francesco is somewhere here in the audience. We started from a prototype created by Francesco
[12:48.440 --> 12:56.000]  and Marcus, which runs Kubernetes controllers inside of WebAssembly. And we refactored it to use
[12:56.000 --> 13:03.400]  wasm time and we added the unloading mechanism. This was done by Tim as part of his master's thesis.
[13:03.400 --> 13:12.160]  And right now, student Kevin is working on it also as part of his master's thesis to improve the
[13:12.160 --> 13:17.240]  build system so that it's much easier to get started with it and to add predictive unloading
[13:17.240 --> 13:23.800]  and more realistic benchmarks to have a better idea of what is the performance for actual production
[13:23.800 --> 13:31.200]  controllers. So the main reason I am here today is to say like, hey, we have a really cool proof
[13:31.200 --> 13:38.680]  of concept, which solves an issue that we have been having. Is this solving an issue for other
[13:38.680 --> 13:45.200]  people in the community? And are you interested in working together on this? If you're interested
[13:45.200 --> 13:51.040]  in working together on this, please get in touch. If you're a student yourself and you want to do
[13:51.040 --> 13:57.160]  like an internship or a master's thesis working on this, we have a lot of opportunities, same for
[13:57.160 --> 14:07.280]  a PhD. So please contact us, send me an email to see what we can do for you and how we could
[14:07.280 --> 14:16.480]  collaborate. So this is the end of my presentation and there's now room for questions. I also put
[14:16.480 --> 14:24.440]  the link to part of our code here. I think this GitHub repo also links to the other repositories
[14:24.440 --> 14:54.200]  that you need. Okay, we can take a couple of questions. So why was he so fast and why
[14:54.200 --> 15:05.400]  it is not possible to do something similar with JVM? So definitely, JVM and WebAssembly are very
[15:05.400 --> 15:13.960]  similar in that regard and a lot of people, they position WebAssembly as being like a more
[15:13.960 --> 15:21.320]  cross-platform and a more cross-language version of the JVM. But if you're only interested in Java
[15:21.320 --> 15:28.120]  and Java-based languages, then the Java runtime itself is a very good alternative to this.
[15:28.120 --> 15:32.760]  Okay, there was another one over here, right? Yeah.
[15:32.760 --> 15:48.280]  So if I understood correctly, you are deploying your operators outside containers and that makes
[15:48.280 --> 15:58.600]  them much more efficient. But, I mean, besides the security aspects, when you deploy in containers
[15:58.600 --> 16:05.400]  and Kubernetes, you have many other things that you can set, like resource limits, but also things
[16:05.400 --> 16:11.640]  like post-topology spread constraints and notations to make sure that some processes are running on
[16:11.640 --> 16:17.400]  specific nodes and so on. How can you address that with WebAssembly? Because you cannot package
[16:17.400 --> 16:22.200]  then your operator like any other workload that you deploy in Kubernetes.
[16:23.000 --> 16:29.720]  Yeah, so it's a very good question. So one of the benchmarks was just running the operators
[16:29.720 --> 16:35.080]  on bare metal, but that's not actually what I'm proposing. It was just to see, like, what is the
[16:35.080 --> 16:42.760]  absolute maximum amount of performance we could get out of this. Our plan is to run each operator
[16:42.760 --> 16:49.240]  inside of its own container. It's just a WebAssembly plus WebAssembly system interface container
[16:49.240 --> 16:58.520]  instead of a Docker container. And so most of the security profile and stuff like that that
[16:58.520 --> 17:04.680]  you have with Docker containers is very similar with WebAssembly. Some would even argue that it's
[17:04.680 --> 17:12.200]  more secure in WebAssembly because it has a much smaller API footprint and it has some of the best
[17:12.200 --> 17:20.600]  teams working on it to make sure it's secure for the browser. Moreover, the code that is running in
[17:20.600 --> 17:30.120]  these WebAssembly containers in my proof of concept, this is control plane code. So this is code that
[17:30.120 --> 17:36.600]  the system administrator selected, like, okay, yeah, I want this specific system administration
[17:36.600 --> 17:44.280]  code to manage my applications. And so in that sense, there's also, like, a higher level of trust
[17:45.000 --> 17:57.240]  put into the code, which means that, like, things like attacks and stuff like that, there's less
[17:57.240 --> 18:11.960]  of a risk to it. But even then, like, it's still running inside of containers.
[18:11.960 --> 18:19.720]  So one of the most important scalability aspects of Kubernetes controllers is the watch-based cache,
[18:19.720 --> 18:26.600]  right? So without it, the API server wouldn't be able to handle all the long pulling and so on.
[18:26.600 --> 18:34.680]  And it's also one of the most memory-intensive aspects of Kubernetes controllers. I was wondering
[18:34.680 --> 18:42.280]  in your memory benchmarks if you were cutting down on this watch-based aspect, or if it is still
[18:42.280 --> 18:49.320]  included in the parent operator. So for example, is the parent operator caching as a proxy for the
[18:49.320 --> 18:55.080]  child operators? Is that the case? Yeah, that's what's happening, basically. The parent operator
[18:55.080 --> 19:01.480]  is where the caches are, yeah.