[00:00.000 --> 00:16.560] Okay, thank you for showing up in person and so many people, so it's actually my first [00:16.560 --> 00:24.120] time at FOSSTEM and I think it's super excited to see so many people and coming back to conferences [00:24.120 --> 00:32.200] but also such a crowded conference so I mean the talks previously were super full now it's [00:32.200 --> 00:39.520] almost half full or half empty depending on how you look at it and even for such supposedly [00:39.520 --> 00:47.960] boring topics like a sovereign cloud I mean that's immediately sparked associations with [00:47.960 --> 00:55.120] state and GDPR I mean all the cookies that you have to click away so sounds boring at first [00:55.120 --> 01:01.560] but I think there's also some value in it and I think there's a journey where open source can [01:01.560 --> 01:09.320] help to make a sovereign clouds come to life or like look at some aspects of it this is me I go [01:09.320 --> 01:16.840] by the name due random on GitHub and on social media three days ago I changed roads so now I'm [01:16.840 --> 01:23.680] in sales again yikes for as a managed open shift black belt so I'm still looking at a little bit [01:23.680 --> 01:34.120] at the cloud topic open shift is like a cloud on a cloud before that I worked quite some time on [01:34.120 --> 01:42.960] AI on AI ops in the office of the CTO and the last thing that I did for two years now is imagining [01:42.960 --> 01:50.720] or revisiting open source now in the age of clouds and seeing how open source principles [01:50.720 --> 02:00.000] can also apply to operations so we're going to look at the operate first community clouds why [02:00.000 --> 02:09.240] do we need a community of practice around operations what does the this community cloud [02:09.240 --> 02:17.400] look like and also where is it so I think this will be a it's not really a hands-on talk but you [02:17.400 --> 02:25.520] can take things away you can if Wi-Fi works for you you can log into the cloud right now and I hope [02:25.520 --> 02:30.720] to see you in some of the meetups or in the community after that because it's it's really open to [02:30.720 --> 02:36.720] anybody who wants to learn something about operations or wants to teach something about [02:36.720 --> 02:47.000] operations so when I first heard the term sovereign cloud like I said it sparked the sovereign the [02:47.000 --> 02:55.880] the king who has now also occupied the cloud and I put it into my favorite search engine and it [02:55.880 --> 03:00.480] immediately came up with a lot of definitions on sovereign cloud there was one from VMware some [03:00.480 --> 03:06.680] from telecom and they all looked at different aspects and these days in case you're not living [03:06.680 --> 03:11.320] under a rock everybody's talking about jet GPT so I thought maybe let's talk to this AI who [03:11.320 --> 03:17.120] already read all the definitions for me and ask it about them sovereign cloud and this is just [03:17.120 --> 03:22.240] the end of my conversation so I wanted to highlight the differences between the noun and [03:22.240 --> 03:30.960] the adjective sovereign and the noun sovereign refers to a personal identity that holds supreme [03:30.960 --> 03:38.080] power and authority while the adjective sovereign describes something that is supreme or superior [03:38.080 --> 03:45.600] in rank that still sounds not really friendly to me like I do I really want something that is in [03:45.600 --> 03:51.400] supreme power over me and why should I care about this then but is there also a notion of [03:51.400 --> 03:56.760] independence in that in that adjective because that's what I always thought about when thinking [03:56.760 --> 04:05.040] about it but a little bit and jet GPT came up with this that there's a notion of independence in [04:05.040 --> 04:12.120] the adjective to be described sovereign means to be independent not subject to control by any other [04:12.120 --> 04:18.000] person or entity which if you think about it that also implements if you have supreme power then [04:18.000 --> 04:24.880] you can also move away and having the highest degree of power and authority the term emphasizes [04:24.880 --> 04:31.560] the idea of self-governance and supreme power within a given context and the context seems to be [04:31.560 --> 04:43.960] cloud so when I look at sovereign cloud at least in my small world view it means I have the power [04:43.960 --> 04:52.240] to move away I have the power to control stuff and I have that that largest amount of independence [04:52.240 --> 04:58.960] and that seems at first contradictory to that business model that we saw in previous talks [04:58.960 --> 05:06.520] right so somebody a nice definition of cloud is I'm running stuff on somebody else's computers so [05:06.520 --> 05:13.880] that doesn't seem to be like a lot of freedom because I have some some lock-in but actually open [05:13.880 --> 05:19.960] source let a path away from lock-in so I think it's important that we apply these open source [05:19.960 --> 05:28.640] principle also to operations and if you these days look at a cloud is it is it really open I mean [05:28.640 --> 05:37.840] it's built on open source software you get your RDS or some some other product and underneath yes [05:37.840 --> 05:43.080] it's running MySQL that's running elastic it's running all that open source stuff but you're [05:43.080 --> 05:51.840] still tied to that to that experience that the cloud provider imposes on you if you want to [05:51.840 --> 06:00.720] rebuild that with open source solutions you can do it it's well it's looking pretty complicated so [06:00.720 --> 06:04.680] you need to master a lot of these technologies you need to stick them together and there's a [06:04.680 --> 06:09.800] reason for why people defer to the cloud because they are interested in the workload they just [06:09.800 --> 06:17.480] want to swap their credit card and consume and build away the applications so but I think the [06:17.480 --> 06:25.400] the last speaker put it really or the previous speaker put it really nice that login to be defined [06:25.400 --> 06:37.520] as a as a product of of cost and the likelihood that something is going away so you have to deal [06:37.520 --> 06:46.480] with that stuff but open source somehow if you go to this slide here so open source actually showed [06:46.480 --> 06:54.920] away out of this and the last the left side of that some funnel here is the traditional open source [06:54.920 --> 07:02.360] as in as in software contributions funnel which we all know for decades and which we all love so [07:02.360 --> 07:13.880] you find a project you use it there might be 100 users of it and at some point something breaks so [07:13.880 --> 07:20.240] you might file an issue great you already contributed because you file that issue and then [07:20.240 --> 07:25.520] maybe at the last time even somebody fixes or maybe you fix that project so there's really a [07:25.520 --> 07:34.800] funnel of 100 users then 10% reporting issues and making up that community and 1% actively [07:34.800 --> 07:42.120] contributing to that project if I'm using something as a service I'm essentially drowning this [07:42.120 --> 07:53.560] funnel so I'm I'm stopping at the at the API layer I might contribute to the underlying open [07:53.560 --> 08:00.840] source software that might run this service but in terms of contribution I'm usually I'm stuck [08:00.840 --> 08:06.320] with maybe filing in the support case and maybe the provider comes back to me but I have no [08:06.320 --> 08:16.040] possibility to actively contribute to that and maybe fix that API outage but maybe I'm the only [08:16.040 --> 08:22.840] person having that problem and so the cloud provider doesn't even care about this and this was the [08:22.840 --> 08:31.600] the notion that our team thought about when thinking about open source in the age of cloud where [08:31.600 --> 08:39.880] it's where there's more value apparently in running and providing the software then the [08:39.880 --> 08:49.520] software code itself or at least that's an unequal scale and as we see with many enterprise [08:49.520 --> 08:58.520] distributions or business models you can get the source code of that database or that service but [08:58.520 --> 09:06.560] you don't get the sometimes you don't even get the built tools you don't get the tools that [09:06.560 --> 09:13.480] actually operate that service the SLIs the metrics that you need to run there the runbooks etc so [09:13.480 --> 09:21.560] every deployment is either behind a paywall because that's the differentiating factor for [09:21.560 --> 09:27.280] that company or you have to learn it yourself and it's it's actually quite hard to open up [09:27.280 --> 09:33.880] something also with legal constraints right here so you may have might have proprietor [09:33.880 --> 09:43.520] PII personally identifiable information in in there you have logs so you need to make sure that [09:43.520 --> 09:49.160] you don't expose any of these secrets so that there's a tight balance and that's why most [09:49.160 --> 09:57.640] companies or most projects default to closed and even for even for communities that run their [09:57.640 --> 10:04.000] infrastructure like the Fedora infrastructure that's somewhat open but you still need to be [10:04.000 --> 10:12.320] going through a lot of hoops to to contribute and to do something so it's not really open by [10:12.320 --> 10:17.520] default and it's also not meant to be as a blueprint for something only 10 minutes left but I [10:17.520 --> 10:25.240] think I can go to the next part of this presentation so this is the this is the concept [10:25.240 --> 10:31.960] right so we need to shift left we need to open up operations and the practice something so that [10:31.960 --> 10:38.280] we build up a community so that we don't have to build our operational deployments from scratch [10:38.280 --> 10:47.160] and what that was is is the concept of this operate first idea we also thought you need to [10:47.160 --> 10:51.240] some have something physical something hands-on where people can actually contribute because [10:51.240 --> 10:55.720] otherwise it would be just a talk show so somebody needs to lead the way and implement that stuff [10:55.720 --> 11:03.520] and we tried to build a hybrid cloud with full visibility into the operation center and hybrid [11:03.520 --> 11:14.720] cloud these days is for a lot of people Kubernetes and so we have two bare metal Kubernetes [11:14.720 --> 11:22.640] clusters running at the Boston University with 34 nodes and 1200 cores so it's not a small [11:22.640 --> 11:31.560] setup then there's one larger cluster running in AWS from the OS climate project which is [11:31.560 --> 11:38.120] also managed with these operate first community cloud ideas and we also work with a German [11:38.120 --> 11:47.200] super scaler that's what the the layer between below a hyperscaler means Jonas they donated [11:47.200 --> 11:53.160] some hardware and we deployed also some some classes there so my vision actually is to to [11:53.160 --> 12:01.400] have a really resilient distributed clouds set up operated under these principles at as many [12:01.400 --> 12:13.240] hardware or cloud providers as possible 626 individuals locked into these into these clusters [12:13.240 --> 12:22.880] about 200 namespaces are there that's the so we do a lot of stuff on most of the stuff is [12:22.880 --> 12:29.800] happening on GitHub we have 150 people on the operate first in the operate first community [12:29.800 --> 12:39.200] there are like 1000 issues being filed the in terms of diversity it's since it's a red [12:39.200 --> 12:45.560] hats bootstrap project it's like one third or half of the people are red hat employees but [12:45.560 --> 12:52.040] there's also a lot of university contributions from American universities and also a lot of [12:52.040 --> 12:59.880] open source projects already contributing them it's just a highlight of some some of the [12:59.880 --> 13:09.560] more noteworthy projects using this infrastructure like OKD the upstream of OpenShift or OS climate [13:09.560 --> 13:21.040] or Janus IDP which is a project for for some backstage plugins so backstage if it's currently [13:21.040 --> 13:31.560] one of the one of the more hyped tools for a server developer portal by Spotify these [13:31.560 --> 13:38.320] are some of the services that are running there like the usual stuff that you would [13:38.320 --> 13:44.720] expect from a cloud setup we have Argo CD for doing GitOps we have Grafana for monitoring [13:44.720 --> 13:51.120] stuff we have tecton pipelines for building things there's a brow instance running for [13:51.120 --> 13:57.360] doing CI CD and a lot of other things so every and that's all deployed by the community [13:57.360 --> 14:05.000] in a GitHub repository where you can integrate into into these into these other services [14:05.000 --> 14:12.400] and I think that's where the actual value comes from so let's get real we love hands [14:12.400 --> 14:22.360] on keyboard and as I said it's all done via a really a GitOps SRE no a GitOps Git first [14:22.360 --> 14:32.800] approach the current entry point for you would be going to operate-first.cloud or and that [14:32.800 --> 14:38.240] click through some hoops and you end up at the service-catalog.operate-first.cloud which [14:38.240 --> 14:48.840] is an backstage instance where we for one showcase the services so you go to the catalog [14:48.840 --> 14:54.960] you see all these services with all their dependencies and you see all the managed clusters there [14:54.960 --> 15:00.160] so you click on one of these clusters and you are presented with a single single sign [15:00.160 --> 15:05.960] on logging screen and if you choose the second option operate-first you can lock in with [15:05.960 --> 15:12.680] your GitHub account so it's authenticated against GitHub and without even signing up [15:12.680 --> 15:20.240] for an account you get a read-only view of the cluster which is pretty awesome so you [15:20.240 --> 15:24.760] see how these these services are being deployed and how other community services are being [15:24.760 --> 15:34.000] deployed so you get a a a hello world example a live hello world example of a of of a fully [15:34.000 --> 15:41.240] production cloud environment which you would see at your at at your site either at your [15:41.240 --> 15:52.040] for your project or for your customer whatever and we documented the way and the why we came [15:52.040 --> 15:57.880] up to certain decisions so in this case it's application monitoring or there's also how [15:57.880 --> 16:03.240] to store credentials in a cloud and these are the the questions that you will face if [16:03.240 --> 16:10.320] you're setting up a your own local cloud and we documented these for con to bootstrap [16:10.320 --> 16:15.560] either other deployments or to contribute back so that you don't have to really read [16:15.560 --> 16:23.680] through so many blog posts and documentations and make your own choice there are some dashboards [16:23.680 --> 16:28.440] here and these are the dashboards that we use for troubleshooting or the community uses [16:28.440 --> 16:36.800] for troubleshooting so you would see Kafka or open data hub and Prometheus live dashboards [16:36.800 --> 16:44.800] and here's one dashboard for our for our clusters and that's the that's the github org where [16:44.800 --> 16:54.760] you would start talking to us or talking to the community the main entry repository is [16:54.760 --> 17:00.520] the support repository you can ask questions or you start with a one of these templates [17:00.520 --> 17:10.520] and one of the coolest templates here or processes is onboarding to a cluster so you get a form [17:10.520 --> 17:15.440] it looks like a form it's a github template you choose which cluster you want to be on [17:15.440 --> 17:22.240] boarded the team name and then we have some automation in place that would automatically [17:22.240 --> 17:29.840] create a pull request to our github repository and we only have to say looks good to me to [17:29.840 --> 17:35.960] it if you're for the person that's part of the operating team and they are onboarded [17:35.960 --> 17:44.360] so that's also giving giving you some sense of how would I automate my local clouds deployment [17:44.360 --> 17:51.400] you don't need to do that but it's it's it's a way to bootstrap you and there's a lot of [17:51.400 --> 18:01.600] other issues going on and as said it's a community so things will eventually also break right [18:01.600 --> 18:08.280] now we have problems with our object storage it's broken if you are an expert in nuba or [18:08.280 --> 18:14.160] in object storage and you want to get your hands dirty in rebuilding some of that stuff [18:14.160 --> 18:20.720] this is the issue so to me though here which will give another awesome talk at the end [18:20.720 --> 18:28.880] of this track here left some comments how to get started nobody worked on it yet so [18:28.880 --> 18:30.520] it's up for grabs thank you [18:30.520 --> 18:50.960] thanks myself question oh hey you provided a good definition for sovereign cloud who [18:50.960 --> 18:59.640] are the customers for sovereign cloud I don't know to be honest so I'm looking at it really [18:59.640 --> 19:07.000] from a technical perspective and I think my key takeaway is everybody who wants to build [19:07.000 --> 19:12.480] their own clouds probably wants to be sovereign in running their cloud and then you have to [19:12.480 --> 19:17.720] focus on stuff like minimizing vendor lock in and being able to move to another cloud [19:17.720 --> 19:28.880] provider or move your data across clouds to jump just on the question is one of the customer [19:28.880 --> 19:33.800] that would like a sovereign cloud because of the resiliency that we need to have because [19:33.800 --> 19:40.880] we have critical infrastructure I think that was a statement not a question right [19:40.880 --> 19:54.880] okay alright take out your smartphone snap that QR codes and there's a bi-weekly community [19:54.880 --> 19:59.140] meetup where you can meet all the wonderful people that are involved in this community [19:59.140 --> 20:26.560] where with