[00:00.000 --> 00:10.680] You're going to talk about cluster API operating Kubernetes with Kubernetes. [00:10.680 --> 00:11.680] How? [00:11.680 --> 00:12.680] Hello. [00:12.680 --> 00:14.160] Thank you for coming. [00:14.160 --> 00:15.160] My name is Alex. [00:15.160 --> 00:16.600] I'm a software engineer. [00:16.600 --> 00:18.920] I work at Susie on the run chair. [00:18.920 --> 00:23.200] I do a lot of stuff related to cluster lifecycle. [00:23.200 --> 00:30.240] And today I'm going to talk about cluster API and operating Kubernetes with Kubernetes. [00:30.240 --> 00:31.760] Hope it will be fun. [00:31.760 --> 00:36.360] Here is a short summary of what we are going to talk about today. [00:36.360 --> 00:43.160] I'll try to explain the problem of managing the Kubernetes cluster lifecycle. [00:43.160 --> 00:49.800] I'll try to explain what is cluster API, how does it approach this problem, and we'll take [00:49.800 --> 00:53.000] a look at some building blocks of cluster API. [00:53.000 --> 00:58.120] And also I'll be doing a demo, and because I don't have enough time, the demo will be [00:58.120 --> 01:01.120] done simultaneously with the talk. [01:01.120 --> 01:05.760] So it's a live demo, nothing is recorded, hopefully everything will be fine. [01:05.760 --> 01:11.600] I already had some problems with networking today, but let's see. [01:11.600 --> 01:14.560] So let's move on to the next slide. [01:14.560 --> 01:19.120] So cluster lifecycle is complicated, and why is that? [01:19.120 --> 01:23.920] But if you have to manage more than one cluster, say you have 10 Kubernetes cluster or maybe [01:23.920 --> 01:31.240] 100 Kubernetes clusters, then the problem becomes similar to managing containers and [01:31.240 --> 01:34.120] why we invented Kubernetes. [01:34.120 --> 01:45.440] And cluster API tries to solve this problem of managing multiple clusters, and also sometimes [01:45.440 --> 01:52.040] you have to manage the underlying infrastructure, and that also somehow needs to be done in [01:52.040 --> 01:54.400] a nice and consistent way. [01:54.400 --> 01:58.560] Then you also have to upgrade clusters, sometimes you have to upgrade multiple clusters, and [01:58.560 --> 02:04.160] upgrading clusters is not always easy, especially when it comes to control planes. [02:04.160 --> 02:08.760] And you want to deploy your clusters on different infrastructure, let's say you have something [02:08.760 --> 02:15.240] running on AWS, when you have some bare metal things running, and you also need to somehow [02:15.240 --> 02:16.560] manage that. [02:16.560 --> 02:20.760] And you don't want to use different tools that depend on your infrastructure, you want [02:20.760 --> 02:25.880] to use something that is a single point of management and it's consistent, it provides [02:25.880 --> 02:31.960] some nice experience, and it's easy to use and automate. [02:31.960 --> 02:33.680] So what is cluster API? [02:33.680 --> 02:42.560] Cluster API takes this approach where we install it, it's an extension to Kubernetes API that [02:42.560 --> 02:48.960] allows you to provision, upgrade, and operate your cluster, and you install it on your Kubernetes, [02:48.960 --> 02:54.520] then you use what we call management cluster to manage workload clusters. [02:54.520 --> 02:59.960] Yes, you can do this on a different infrastructure provider, you can have one management cluster [02:59.960 --> 03:04.920] managing stuff running on AWS, and you can have the same cluster managing your clusters [03:04.920 --> 03:07.600] on Azure. [03:07.600 --> 03:16.480] So this is the basic idea of cluster API, and next we are going to take a look at the [03:16.480 --> 03:22.160] building blocks of CAPI, and I will start my demo. [03:22.160 --> 03:29.480] But before this, let me switch to the terminal and show you what I have prepared in advance. [03:29.480 --> 03:38.000] So I deployed a management cluster where I already installed CAPI so we don't lose time, [03:38.000 --> 03:44.560] everything should be up and running, and yeah, let's move on. [03:44.560 --> 03:50.320] The main entity in the cluster API is called cluster, and it represents a Kubernetes cluster, [03:50.320 --> 03:55.920] it's not tied to some kind of infrastructure, so it's just a generic Kubernetes cluster. [03:55.920 --> 04:02.960] And to make it more clear, I will show you how it looks like. [04:02.960 --> 04:08.480] As you can see, it's a normal Kubernetes object that has some kind, metadata, but what's [04:08.480 --> 04:20.040] interesting for us is the spec here, you can see the spec references, two things, yeah, [04:20.040 --> 04:25.480] the first reference is a reference to infrastructure, and for this demo I'm going to use Docker [04:25.480 --> 04:32.400] as infrastructure provider because I don't want to make any requests to some cloud because [04:32.400 --> 04:37.480] of a network, I wasn't sure if it's going to work properly, so I decided to use Docker [04:37.480 --> 04:43.720] as our infrastructure provider, it's an infrastructure provider we use for development and testing, [04:43.720 --> 04:49.840] and the second interesting reference is a reference to what we call control pane providers, [04:49.840 --> 04:55.080] and because control planes are harder to manage than worker machines, we require a [04:55.080 --> 05:01.080] specific resource for that, and this control pane provider is based on a tool called QPADM, [05:01.080 --> 05:09.400] which is a default that you can use with CAPI, so let me create this cluster, and we can [05:09.400 --> 05:17.280] take a look at the objects that are referenced inside. [05:17.280 --> 05:21.360] The first reference you saw is a reference to Docker cluster, it's also what we call [05:21.360 --> 05:29.360] an infrastructure cluster, and it's responsible for all prerequisites that are required to [05:29.360 --> 05:34.280] run your cluster on any infrastructure, so for example, if you're running it on public [05:34.280 --> 05:39.920] cloud, it will provision all networks, load balancer, security groups, VPCs, and whatever [05:40.000 --> 05:49.500] else you need, and this reference is actually what makes cluster API plugable, so if you [05:49.500 --> 05:55.080] want to add your own provider, you just have to follow a documentation implement API with [05:55.080 --> 06:01.760] some rules and then you can reference it, and that's how you plug in your own provider. [06:01.760 --> 06:06.640] Let me show you how Docker cluster looks in our case, it's pretty simple, there is no [06:06.720 --> 06:19.560] real infrastructure to run, so I'm going to create it too, okay, it's done, then the next [06:19.560 --> 06:25.360] reference we saw in cluster object was a reference to what we call a control pane provider, [06:25.360 --> 06:32.160] what it does, it creates a control pane machine, generates cloud config, and also is responsible [06:32.200 --> 06:37.840] for any other actions related to control pane management, stuff like, you know, HCD, [06:37.840 --> 06:44.640] Core DNS, or whatever you implement or want to enable. Let me show you how it looks like, [06:44.640 --> 06:50.240] this will be so far the biggest object we have there, because it contains some configurations [06:50.240 --> 06:58.600] we require for our control pane, but as you can see, you can customize some Kubernetes [06:58.680 --> 07:04.440] components there using Kubernetes API, so if you would like, you can just specify anything [07:04.440 --> 07:11.440] you need here to provision control planes, you can also specify replica set, and you also [07:11.440 --> 07:15.440] need Kubernetes version there. Now, maybe I forgot to create it. [07:29.560 --> 07:40.080] Yeah. Okay, so let's talk about worker machines and how does KPI approach managing machines. It's [07:40.080 --> 07:49.000] important first to note that machine is just a host for your Kubernetes nodes, so it can be [07:49.000 --> 07:55.560] virtual machine, can be bare metal, can be anything your infrastructure provider means, and I'd like [07:55.600 --> 08:01.880] to show an example with bots, you don't manage bots manually, right? You don't use them as a [08:01.880 --> 08:07.360] standalone resource, you use something else. If you want to manage replicas count for your [08:07.360 --> 08:12.800] bots, you use something called replica set that has just one purpose, create your certain count [08:12.800 --> 08:18.280] of bots, and then if you want to do more complex stuff like rolling upgrades, you use a deployment [08:18.280 --> 08:24.520] on top of this that manages replica set, so KPI followed the same pattern and created machines, [08:24.560 --> 08:29.240] then there is a machine set that manages replica count, and there is a machine deployment on [08:29.240 --> 08:42.480] top of that, that does more complicated things. Let's go back to the terminal. I will show you a [08:42.480 --> 08:49.280] machine deployment, you can see similar to normal deployment has replica count, then it has a [08:49.320 --> 08:56.040] selector, has a template, and inside the spec is similar to what we saw with cluster object, it has [08:56.040 --> 09:03.880] two references, one is for our infrastructure template, which is Docker for this demo, and the [09:03.880 --> 09:08.480] second one is a bootstrap provider, which is based on QPADM. [09:20.280 --> 09:29.680] So the infrastructure template or Docker template that we saw there in the reference are just [09:29.680 --> 09:34.760] specifications for your host depending on your cloud provider, it can be an instance type, [09:34.760 --> 09:42.880] storage size, anything you put there, and the second reference to bootstrap provider is just [09:43.280 --> 09:49.840] a reference to an API that generates user data with proper cloud config, so you can configure [09:49.840 --> 09:56.360] your Kubernetes components as you want. Let me show you how it looks like. For Docker machine, [09:56.360 --> 10:04.960] it's just an image in this case and some extra mounts, and for bootstrap provider, we just have [10:04.960 --> 10:09.160] some arguments for our Kubernetes components, and this is it. [10:12.880 --> 10:33.680] Okay, so this was it. Let me now check if everything works fine. Yeah, everything works fine. As you [10:33.680 --> 10:40.000] can see, we have three control pane machines that are running inside Docker containers that we [10:40.000 --> 10:45.680] created before and after some time, we should get a worker machine that we just created. [10:45.680 --> 10:57.880] Let's take a look at how it all works together. We have a cluster object that represents the cluster, [10:57.880 --> 11:05.080] then it has to reference an infrastructure provider, which is Docker in this case, and it also has to [11:05.080 --> 11:13.160] reference a control pane provider, which is based on QPADM, and once these two are done with a job, [11:13.160 --> 11:20.160] you can connect your machine deployments that have to reference a machine template, so Kapi knows [11:20.160 --> 11:28.440] what specifications you want, and also a QPADM config template where you can configure your [11:28.480 --> 11:43.120] Kubernetes components, and this is all you need to create a basic Kapi cluster. Unfortunately, I don't have enough [11:43.120 --> 11:51.200] time to talk about other things that exist in Kapi like machine health checks that help you track and [11:51.200 --> 11:57.360] remediate unhealthy machines when there are cluster classes, which are powerful templates for creating [11:57.480 --> 12:03.320] clusters. You can also connect cluster autoscalar if you want, and there are day two operations coming, [12:03.320 --> 12:08.920] so you can think of KPS like SwissKnife for everything related to cluster lifecycle. [12:12.800 --> 12:22.960] And we still have time. I'm going to show you how we can upgrade the cluster. Let's check its state again. [12:23.920 --> 12:25.920] Yeah, so if you... [12:29.880 --> 12:39.880] Now you can see that we have three control planes, and they all are running Kubernetes v125, and let me [12:39.880 --> 12:48.880] upgrade them to Kubernetes v126, so how do I do this? In order to do this, we have to change the version [12:49.800 --> 12:55.800] in the control pane provider object, and we also have to change the image reference in the machine [12:55.800 --> 13:01.800] template. So just by doing so, I will start upgrading the cluster. [13:10.800 --> 13:17.800] As you can see, cluster API started to spin up new control pane machine with v126 that is going to [13:18.720 --> 13:24.720] replace old ones, and it's going to take care for us like insuring a CD quorum and all sorts of things, [13:24.720 --> 13:26.720] so we don't have to take care about this. [13:35.720 --> 13:45.720] I'm going to go back to the summary, and let's go once again for what we saw today. So I try to explain [13:46.640 --> 13:53.640] the problem of managing Kubernetes clusters, and the main idea, we wanted to have a tool that provides [13:53.640 --> 13:59.640] a declarative and consistent API, and will allow you provision and manage your clusters on different [13:59.640 --> 14:07.640] infrastructure in some nice way so you can have a single point of managing your clusters for all [14:08.560 --> 14:15.560] the possible infrastructures you're running, and this approach is like use Kubernetes because [14:15.560 --> 14:21.560] Kubernetes already provides a lot of tools for building a powerful API. [14:25.560 --> 14:31.560] I think with us it, maybe I was a bit quick, but I don't have anything else. I'm ready to answer [14:31.560 --> 14:33.560] questions if someone has. [14:38.560 --> 14:43.560] Okay, we have ample time for questions. [14:47.560 --> 14:53.560] Hi, thanks for the nice demo. This allows you to manage the workload clusters. Can it also [14:53.560 --> 14:57.560] manage the life cycle of the management cluster, or how do you do that? [14:57.560 --> 14:58.560] Yes, you can. [14:58.560 --> 15:00.560] So what if it destroys itself, so what happens then? [15:00.560 --> 15:03.560] It shouldn't. Depends on how you use it, but yeah. [15:03.560 --> 15:05.560] Works on local clusters, thank you. [15:06.480 --> 15:11.480] Thank you. [15:11.480 --> 15:16.480] The question about updates, is it possible to update components like cobalates without [15:16.480 --> 15:18.480] recreating virtual machines? [15:18.480 --> 15:19.480] Yes. [15:19.480 --> 15:20.480] And how is it working? [15:20.480 --> 15:25.480] It's done through your bootstrap or control pane provider. [15:25.480 --> 15:32.480] Yeah, and you also have to provide an image that will be used for your new instances. [15:33.400 --> 15:38.400] No, no, I mean if you need to update cobalates and you don't want to reorder new... [15:38.400 --> 15:43.400] Yeah, okay. Costa API doesn't support in-place upgrades. It will be creating a new machine [15:43.400 --> 15:46.400] with new image, new everything, and then replacing old one. [15:46.400 --> 15:50.400] Okay, got it. And can you tell a little more about control pane updates? [15:50.400 --> 15:51.400] Sorry? [15:51.400 --> 15:56.400] Control pane updates, updates of control pane nodes and components. [15:56.400 --> 16:02.400] So I just showed one, like when you change the version it will start replacing old machines [16:02.400 --> 16:05.400] with newer ones. You just have to provide all the specifications. You have to provide [16:05.400 --> 16:10.400] a new Kubernetes version you want and also a new image. So we try to bake everything [16:10.400 --> 16:16.400] inside the machine image so you don't have to download new things and it will just replace [16:16.400 --> 16:23.400] old machine with a new one, with new versions. So it's a replace upgrade. It's not in place. [16:24.400 --> 16:29.400] The same as POTS, if you change, for example, reference to image, it will destroy old one [16:29.400 --> 16:33.400] and create a newer one. So it's the same concept. [16:33.400 --> 16:38.400] There's an online question is in the chat. Are there any latency requirements between [16:38.400 --> 16:42.400] the management cluster and the workload cluster? [16:42.400 --> 16:52.400] It depends on your use case, but yeah, ideally you should take care of your management cluster [16:52.400 --> 17:02.400] with somewhere near workload clusters or is able to reach it within some limits. [17:06.400 --> 17:12.400] And one more. Does the management cluster need to run at all the time or can it be shut off [17:12.400 --> 17:15.400] when not doing life cycle work? [17:15.400 --> 17:21.400] So here is the thing. If you disable it nothing will manage your Kubernetes cluster [17:21.400 --> 17:25.400] so they will be basically unmanaged. Yeah, your workload cluster will continue running [17:25.400 --> 17:30.400] but there is nothing that will keep track of them. For example, if you use cluster autoscaler [17:30.400 --> 17:34.400] or machine health checks you need your management cluster to be running all the time [17:34.400 --> 17:38.400] because it constantly looks at the state of your workload clusters. [17:42.400 --> 17:47.400] Okay. If there are no more questions and we can end a few minutes early. [17:47.400 --> 17:50.400] Thank you for the talk. Thank you all for attending.