[00:00.000 --> 00:21.680] Hi, everyone. I hope you're doing well and not feeling [00:21.680 --> 00:28.280] enough sleepy after the lunch hours. We are here to talk about introduction to Sef on [00:28.280 --> 00:35.120] Kubernetes using Rook. Here's Alexander. Alexander will introduce himself. I'm Gaurav, [00:35.120 --> 00:39.920] Cloud Storage Engineer at Coore Technologies and I'm also a community [00:39.920 --> 00:44.480] ambassador for the Sef project from Indian region and been working with the [00:44.480 --> 00:49.000] Sef and Rook project since a long time and now contributor to the Rook project. [00:49.000 --> 00:57.440] I'm Alexander Trost, funding engineer of Coore Technology Sync. I'm a [00:57.440 --> 01:03.480] maintainer of the Rook project as well and yeah we wanted to talk about Rook for [01:03.480 --> 01:08.800] everyone who doesn't know it. I want to get you started with storage who doesn't [01:08.800 --> 01:13.840] need fast reliable storage nowadays with the cloud native applications. We're [01:13.840 --> 01:20.280] obviously talking about a bit more well more performant storage I guess depending [01:20.280 --> 01:32.840] on who you're asking. Well the point of Rook in end is that with Kubernetes being [01:32.840 --> 01:39.840] kind of like with the shipping container ship here you have your Kubernetes [01:39.840 --> 01:46.160] that kind of abstracts everything tries to well provide you this one API for most [01:46.200 --> 01:53.920] to all things depending on how far you want to go with it and I guess for most [01:53.920 --> 01:56.320] people running Kubernetes it kind of looks like that if you have your big [01:56.320 --> 01:59.080] giant ship running your production applications and you have your [01:59.080 --> 02:08.720] automation and CI CD process that kind of just try to keep it running and that's [02:08.720 --> 02:12.520] where the question with storage more and more comes into frame for people [02:12.560 --> 02:19.000] especially with local storage already like I think a year or so a year or two [02:19.000 --> 02:26.200] even ago coming into let's say being better supported in Kubernetes in a [02:26.200 --> 02:31.040] native way and not just having things around Kubernetes to try to make that [02:31.040 --> 02:38.200] an easier endeavor. We have the question of how can I for example get my self [02:38.200 --> 02:44.880] storage talking with Kubernetes so that I have storage for my applications and [02:44.880 --> 02:52.000] well that's the simple interface it's more or less great that it's nowadays [02:52.000 --> 02:58.000] mainly one interface there called CSI container storage interface which for [02:58.000 --> 03:01.600] well for storage vendors basically means they only need to implement one [03:01.600 --> 03:07.920] storage they only need to implement one interface and as well for [03:07.960 --> 03:12.440] Kubernetes slash you as a user you have one interface or like one way on how you [03:12.440 --> 03:17.360] can get storage for example if you want storage on Kubernetes you have the way [03:17.360 --> 03:21.840] of using the persistent volume claims we basically from our application [03:21.840 --> 03:28.440] perspective claim storage and Kubernetes will take care of for example talking [03:28.440 --> 03:35.840] with self storage and provisioning the volume and subsequently the CSI driver [03:35.840 --> 03:40.480] from Seth will take care of the whole mapping the volume mounting the volume [03:40.480 --> 03:43.840] so that is completely transparent to your application and the whole thing is [03:43.840 --> 03:49.360] with the CSI interface there it's like this one way for any storage vendor to [03:49.360 --> 03:54.200] also get their storage running like there's well there's obviously more than [03:54.200 --> 04:00.160] Seth but well obviously with Rook Seth we're going to focus on Seth here and [04:00.160 --> 04:04.160] that's exactly kind of like a connector that in between there so if you want [04:04.160 --> 04:10.520] storage doesn't really matter if it's just Seth the point over obviously Seth [04:10.520 --> 04:15.200] is that we have the Seth CSI that's disconnecting bits from Kubernetes from [04:15.200 --> 04:20.480] the applications container side for your storage and that's already a point [04:20.480 --> 04:26.640] where kind of Rook we're starting to talk about Rook here is that you can run [04:26.640 --> 04:33.680] your Seth's storage cluster well on most to any hardware I don't know what we [04:33.720 --> 04:39.400] could run it on a Raspberry Pi as well all right yes easily um you can well I [04:39.400 --> 04:44.360] think I've heard people run it on some Android phones or something even as well [04:44.360 --> 04:49.440] but it's like the well you know just because you can doesn't necessarily mean [04:49.440 --> 04:52.560] you should but that's a whole nother discussion the point being you can [04:52.560 --> 04:56.640] technically have yourself storage anywhere so it doesn't really matter if [04:56.640 --> 05:04.440] it's well if it's on the metal in your own data center or if it's just a few [05:04.440 --> 05:08.320] laptops thrown together doesn't that's the thing with Seth in general there it's [05:08.320 --> 05:13.000] like you don't need the best hardware like you don't need to buy that big box [05:13.000 --> 05:16.920] from the one storage hardware vendor maybe to explicitly go into that [05:16.920 --> 05:23.840] direction to have storage and that's the thing where kind of the combination of [05:24.160 --> 05:32.240] using Kubernetes and Seth might come into play or just for having storage for [05:32.240 --> 05:44.360] your applications but also as a point for how should I put it for running Seth [05:44.360 --> 05:51.680] that's what Rook is about it's about running Seth obviously the connecting [05:51.680 --> 05:56.920] part setting that con connection up between Seth and Kubernetes as well but [05:56.920 --> 06:03.520] the idea is that Rook runs Seth in Kubernetes in containers kind of I [06:03.520 --> 06:07.600] think I think I mainly saw it from Seth ADM last time we deployed it cluster on [06:07.600 --> 06:12.440] bare metal directly that like Seth ADM also one other way maybe to put like that [06:12.440 --> 06:18.720] to do install deploy even configure easily manage it's easily managed [06:19.480 --> 06:23.720] but it's one way to just install run it it's kind of the same point of for like [06:23.720 --> 06:30.360] Rook where Rook is basically a Seth operator for Kubernetes I'm going to go [06:30.360 --> 06:33.920] into a little bit more what an operator does because that's like that's one of [06:33.920 --> 06:38.880] the vital points in general just from well running certain applications on [06:38.880 --> 06:45.000] Kubernetes and just again as we had it like running Seth on Kubernetes we can [06:45.000 --> 06:52.440] with the operator pattern that we have in Kubernetes we can easily have most [06:52.440 --> 06:58.520] things that cause quite some pain depending on how big you scale your [06:58.520 --> 07:02.600] storage cluster as well obviously deployment bootstrap and configuration [07:02.600 --> 07:08.560] upgrades and everything like that's all processes that I think there's probably [07:08.560 --> 07:12.560] five million ansible playboots to install Seth there's well obviously Seth [07:12.600 --> 07:18.920] ADM there's what was called the deploy was there early as well which is Seth [07:18.920 --> 07:24.720] ADM I think now it is right Seth deploy is earlier Seth ADM is now more of a [07:24.720 --> 07:32.400] no advanced and I mean latest tool that everyone is using these days and there's [07:32.400 --> 07:36.080] more like I can already just think about five more tools on like how to deploy [07:36.080 --> 07:43.560] Seth and ironically for the people that have looked into Kubernetes a bit more [07:43.560 --> 07:51.440] already it's kind of the same story for deploying Kubernetes but because of [07:51.440 --> 07:54.640] Kubernetes being kind of like this abstraction layer on top of hardware to [07:54.640 --> 08:04.720] some degree abstracting everything away but very quick skip this is that [08:04.720 --> 08:11.040] patch it allows the Rook operator that's exactly where this image comes in [08:11.040 --> 08:15.800] it's orchestrating a cluster it's not just a well deployment office here as [08:15.800 --> 08:22.120] well but it's about using the Kubernetes APIs to easily just take care of [08:22.120 --> 08:27.080] everything so to say you want add a new a new note into your storage cluster [08:27.080 --> 08:33.320] what do you do technically speaking you just add it to Kubernetes and well if [08:33.320 --> 08:37.680] everything goes well ten seconds later the operator will be like oh new note [08:37.680 --> 08:42.680] gotta do my job run the preparing job and everything get the note ready and a [08:42.680 --> 08:48.400] few seconds even later from that the new self components the OSDs on the disk [08:48.400 --> 08:53.760] depending what disks are obviously as well are taken care from that's kind of [08:53.760 --> 08:58.400] to make this full circle there with Kubernetes side is like what the [08:58.440 --> 09:05.160] operator flow kind of pattern it's mostly called is about it's about observing [09:05.160 --> 09:10.480] the operator is observing a status or even in Kubernetes case custom resources [09:10.480 --> 09:17.800] these are just think about it as like YAML let's just give it a dead what it is [09:17.800 --> 09:23.840] an object of YAML in Kubernetes which the operator can well watch on I as a [09:23.840 --> 09:29.360] user either make a change or even like my automatic CI CD process makes a [09:29.360 --> 09:33.400] change to it like oh a new note has been added or something or I want to tweak [09:33.400 --> 09:47.560] something in the configuration of the cluster and so the operator is observing [09:47.560 --> 09:53.800] that and when there's a change or when there's even in like the Kubernetes [09:53.800 --> 09:58.240] cluster there's a change like a node missing or something it analyzes that [09:58.240 --> 10:03.600] change for example if a node is gone or is just not ready in Kubernetes terms [10:03.600 --> 10:08.920] anymore let's say network outage for like two of your nodes the operator would [10:08.920 --> 10:15.360] analyze well observe it first of all analyze that and start acting upon that [10:15.400 --> 10:20.520] for example in Kubernetes terms it would take care of setting certain so-called [10:20.520 --> 10:25.640] just to have that term out there portrait disruption budgets which kind of [10:25.640 --> 10:31.400] tried to prevent other nodes from potentially stopping the components of [10:31.400 --> 10:36.400] the SAF storage cluster main point is really just that it's this like observe [10:36.400 --> 10:41.080] analyze act kind of loop because in the end it just repeats itself all over again [10:41.160 --> 10:46.960] it's a whole deal with Kubernetes operators it's it's again if like I [10:46.960 --> 10:54.280] want to for I guess the people more already into the SAF if you want to [10:54.280 --> 11:00.480] scale up some more SAF monitors or well soft months you just edit the object in [11:00.480 --> 11:05.080] Kubernetes in the API and just crank the number from one count from three to five [11:05.080 --> 11:09.520] or something and again this changes detected by the operator analyzes it [11:09.640 --> 11:16.680] and acts upon it and that makes it quite convenient as well again here over [11:16.680 --> 11:20.680] perfect with the YAML sorry it's a I've just this guy and come a little bit of [11:20.680 --> 11:26.120] clarification I don't have it mirrored on my screen so it's a bit hard but [11:26.120 --> 11:30.760] that's exactly the YAML that we talked about like as an example I have my [11:30.760 --> 11:38.200] cluster running and let's say new SAF release for what I would need to do to [11:38.240 --> 11:43.520] upgrade my cluster I would basically go ahead and just change the image to be [11:43.520 --> 11:51.160] well not 1723 let's say as an example yeah 1725 as an example and again [11:51.160 --> 11:55.760] operator would detect that analyze if every component is up to date or not and [11:55.760 --> 12:01.880] then even start well I don't want to say complicated upgrade process but there's [12:01.880 --> 12:05.880] especially with something a SAF there's more than just I'll let me just restart [12:05.880 --> 12:09.800] it there's checks before every component is restarted through SAF native [12:09.800 --> 12:15.120] ways of like it's basically commands that are well okay to stop they're [12:15.120 --> 12:19.160] basically called like that and that's the whole idea there that the operator [12:19.160 --> 12:23.040] helps you with that and in the end just fully takes care of it so that in the end [12:23.040 --> 12:28.440] for the main part of your work you can just sit back change it in the YAML in a [12:28.440 --> 12:35.400] few minutes or it can't even be ours depending on how big the cluster is [12:35.400 --> 12:41.640] the operator will take care of that as I mentioned before like the [12:41.640 --> 12:46.120] example of the monitor count for example we want to change that change it a few [12:46.120 --> 12:49.600] seconds later the operator will pick that up and start making the changes [12:49.600 --> 12:54.960] necessary or even if you would want to scale it down from like 5 to 3 or 3 to [12:54.960 --> 13:01.800] 1 which not recommended we need highly availability there or another option the [13:01.800 --> 13:06.960] operator again what does it takes care of doing it or even if you want to [13:06.960 --> 13:12.480] specifically say on this one note please use this one device or even for this [13:12.480 --> 13:19.600] then disk or NVMe for example use more than one storage team OSD team for that [13:19.600 --> 13:27.520] these things are possible and quite easily just by writing some lines of [13:27.520 --> 13:31.720] YAML in the end according to your workload you can easily just customize [13:31.720 --> 13:35.440] your according to your workload you can easily customize your YAMLs that will [13:35.440 --> 13:43.240] make your life easier and we've mainly talked about having like the cluster [13:43.240 --> 13:46.960] running or even setting up the cluster with the YAML definition of a self [13:46.960 --> 13:54.240] cluster object but if you would for example want to well run some [13:54.240 --> 13:59.720] Prometheus in your Kubernetes cluster and need storage for them to be able to [13:59.720 --> 14:03.720] use storage in SAF you need to have a storage pool for example RBD storage [14:03.720 --> 14:09.760] block storage basically we also again just go ahead create a SAF block pool [14:09.760 --> 14:15.840] object which is simply containing the information of if we go from here failure [14:15.840 --> 14:24.040] domain where well you basically tell SAF to only store data on different hosts to [14:24.040 --> 14:30.160] keep it simple for now the replicated size that there will be a free total [14:30.160 --> 14:35.920] replica of three copies of data in your cluster that requires SAF replicas [14:35.920 --> 14:41.240] let's just skip the phone out it's like a SAF replica size and even that you [14:41.240 --> 14:45.600] could technically set the compression mode for this pool point is again we [14:45.600 --> 14:51.400] can just write this in YAML apply it against the Kubernetes API and a few [14:51.440 --> 14:56.320] seconds later also for like the other objects same way you need the SAF file [14:56.320 --> 15:02.160] system SAF object storage same way the operator takes care of creating all the [15:02.160 --> 15:08.120] components for example the MDS for a file system we have the well standard [15:08.120 --> 15:14.560] components like the manager the monitors operator the OSD's and even for the [15:14.560 --> 15:19.040] object store for example the RJW components and the operator simply takes [15:19.080 --> 15:22.840] care of that and again if you change the SAF version a few seconds to maybe a [15:22.840 --> 15:26.520] minute or two later depending on what the state of your clusters operator will [15:26.520 --> 15:37.160] take care of doing the update we've talked about we've talked about we've [15:37.160 --> 15:44.760] talked about deploying root SAF cluster mainly right now we want to highlight in [15:44.800 --> 15:53.440] that in that point as well the crew plug-in that root SAF is building and [15:53.440 --> 16:01.760] yeah providing it allows you to well have certain processes automated even [16:01.760 --> 16:07.720] certain disaster recovery cases are easier to handle with that and GERF will [16:07.800 --> 16:23.720] talk a bit about that so so what's crew right crew is basically a package [16:23.720 --> 16:28.720] manager for kubectl plugins you can I mean it makes the management of [16:28.720 --> 16:33.800] Kubernetes easier and that's how the core developers and maintainers came [16:33.840 --> 16:39.880] together and thought we can definitely write a plug-in to make the life of our [16:39.880 --> 16:45.800] developers and administrators more easier crew was the way to go since it's [16:45.800 --> 16:52.880] the de facto package manager for kubectl plugins so I mean you can just do a [16:52.880 --> 16:59.600] crew kubectl install kubectl crew install root SAF that's how the plug-in [16:59.600 --> 17:05.240] will be installed and just if you can see we just ran the help command it shows [17:05.240 --> 17:11.440] a bunch of things that you could do you can just run a whole bunch of SAF [17:11.440 --> 17:16.400] commands RBD commands right now also check the health of your cluster you [17:16.400 --> 17:21.320] could just do a bunch of things like even if you want to remove an OSD so the [17:21.320 --> 17:27.960] need actually arise because for example if you want to use underlying tools like [17:27.960 --> 17:32.240] SAF object store or something like that to debug core troubleshooting issues and [17:32.240 --> 17:37.840] core issues at core OSD level I mean crew plugin is definitely a great way to [17:37.840 --> 17:43.440] go as it provides common management and troubleshooting tools for SAF it's [17:43.440 --> 17:50.920] currently I mean a lot of things work will show you it's just like I mentioned [17:50.920 --> 17:55.400] you just need to run kubectl crew install root SAF it goes ahead quickly [17:55.560 --> 18:01.760] installs the plugin it's I mean way easier that you just don't need to earlier I [18:01.760 --> 18:06.200] mean you need had to go inside the toolbox pod to debug and troubleshoot [18:06.200 --> 18:12.000] every issue with crew it provides such an ease of access that it makes I mean [18:12.000 --> 18:17.800] lives easier and troubleshooting makes is definitely easier you could just [18:17.800 --> 18:23.120] override the cluster configuration just at the runtime and some of the disaster [18:23.160 --> 18:26.880] recovery scenarios are also addressed some of the troubleshooting scenarios that [18:26.880 --> 18:31.920] were addressed is mon recovery suppose let's say you will have the default three [18:31.920 --> 18:37.120] mons in the cluster and majority of them lose code I mean recovering mons from [18:37.120 --> 18:43.560] mon maps I mean just doing bunch of tasks could be if not done carefully it [18:43.560 --> 18:50.080] could be it could lead to more disasters but certainly with more automation in [18:50.080 --> 18:54.280] place when things are definitely working this is also made easier with the [18:54.280 --> 19:02.560] crew plugin and even if you want to troubleshoot CSI issues it's it it makes [19:02.560 --> 19:11.480] it easier for sure so yeah I mean just like if you want to restore mons with [19:11.480 --> 19:17.120] OSDs and even if we just delete a custom delete the root SAF cluster after [19:17.320 --> 19:23.960] accidental deletion of custom resources that could be also restored and one of [19:23.960 --> 19:28.160] the common goals in the road map is also automating core dump collection [19:28.160 --> 19:33.360] because let's say if there's an issue that happens with the SAF demon and we [19:33.360 --> 19:36.920] want to collect the core dump of the process for further investigation to [19:36.920 --> 19:43.280] share it with the developers and with the community to understand what issues [19:43.280 --> 19:47.740] we are facing it can easily do well if you want to just do a performance [19:47.740 --> 19:52.320] profiling of a process with gdb that could be made easier as well so that [19:52.320 --> 19:56.440] these are some of the goals the current the current plug is in plug-in is [19:56.440 --> 20:01.040] written in bash but there's a work going on to rewrite the whole plug-in in [20:01.040 --> 20:07.520] Golang so that it's definitely more scalable and much more easier to manage [20:08.120 --> 20:15.040] and even for contributors so yeah just like that [20:25.880 --> 20:36.760] so I guess the point we're more or less just trying to make is that if you have [20:36.800 --> 20:43.400] Kubernetes or even run a distribution of well what is the Ranger open shift [20:43.400 --> 20:50.960] obviously as well on your hardware and I would even put it to some degree as like [20:50.960 --> 20:59.200] a you're confident enough with Kubernetes to run it you can have it quite [20:59.200 --> 21:05.040] easy running a SAF cluster as well on top of that obviously to some degree you [21:05.040 --> 21:09.720] need some SAF knowledge but that's with everything if you run it in if you want [21:09.720 --> 21:16.000] to run it in production it's just that with this abstraction layer again with [21:16.000 --> 21:21.320] Kubernetes it makes it easier for you it's more of like you kind of start in [21:21.320 --> 21:27.440] general there to think of more of like well I have some notes and they're [21:27.440 --> 21:32.160] simply well there to take care of the components that you need to run for the [21:32.160 --> 21:38.360] SAF cluster and especially with the root SAF operator obviously it makes the [21:38.360 --> 21:45.720] process easier by well a GitOps approach for example where you can just throw your [21:45.720 --> 21:53.960] YAMLs into well into Git most of the time and have that automatic mechanism [21:53.960 --> 21:57.360] basically to care of this deployment process so that again the operator just [21:57.400 --> 22:03.600] takes the YAML takes care of it and makes the changes necessary and with the [22:03.600 --> 22:10.880] RUG SAF crew plugin just so you get that summarized real quick again it's a way [22:10.880 --> 22:17.520] for yeah for us to have certain automatic processes in the hand of admins [22:17.520 --> 22:23.200] when they need to and not just as like a hey here's like a 100 line bash clip [22:23.200 --> 22:28.760] please run that one command at a time and it simply allows it again because we [22:28.760 --> 22:32.680] have this access to communities where we can just ask Kubernetes hey where's the [22:32.680 --> 22:37.840] monitor running oh it's on node A and all that because well we have an API that [22:37.840 --> 22:42.880] can tell us most of this information and also for recovery scenarios there we [22:42.880 --> 22:48.760] can just ask Kubernetes to run a new pod or to well have a new monitor for [22:48.760 --> 22:52.360] example then running with this old information from the other monitors to [22:52.400 --> 22:59.760] have this forum recovered that is required there and regarding RUG SAF is [22:59.760 --> 23:07.160] like a general outlook for the future some of the major points we're currently [23:07.160 --> 23:11.800] looking at is that we want to improve the cluster manageability even more than [23:11.800 --> 23:16.120] it we already have it at we'll make it easier we're using the RUG SAF plugin [23:16.120 --> 23:21.200] right now you still need to do quite a lot of manual YAML editing of the [23:21.200 --> 23:27.320] objects that we have in the in the in the API but we would like to have well [23:27.320 --> 23:32.120] some more crew plugin commands there again to extend that functionality make [23:32.120 --> 23:38.120] make it simply easier as well improved security by having the operator and [23:38.120 --> 23:43.640] other components that are running in the cluster use separate access credentials [23:43.640 --> 23:48.080] to the cluster just to have there a bit more well I guess to some degree even [23:48.120 --> 23:54.280] transparency if you would look at like audit logging of the SAF cluster and as [23:54.280 --> 24:02.800] well that's encryption support for SAFFS and OSDs on partitions and as with [24:02.800 --> 24:07.080] everything there's more feel free to check out the roadmap MD file on the [24:07.080 --> 24:13.760] GitHub GitHub.com slash RUG the link will be shown as well if you want to get [24:13.760 --> 24:16.800] involved if you want to contribute if you have questions or anything we have [24:16.800 --> 24:21.000] well obviously to GitHub there's even the GitHub discussions open if you have [24:21.000 --> 24:30.000] any well any more more questions I guess then that might not fit on Slack we [24:30.000 --> 24:34.320] well we have Twitter account obviously we also have community meetings if you [24:34.320 --> 24:42.240] have any more pressing concerns to talk about and well just kind of from that [24:42.280 --> 24:47.080] side as well where as Garth and I mentioned we're from code technology [24:47.080 --> 24:51.480] sync we're building a company that wants to create a product around RUG staff and [24:51.480 --> 24:55.600] just in general try to help the community out there if we do talk with us [24:55.600 --> 25:02.320] or contact us as well and for now thank you for listening and we'll gladly take [25:02.320 --> 25:07.040] some questions and can simply take the last I think you showed 50 minutes for [25:07.040 --> 25:10.280] questions or even just talking a bit about certain scenarios here with [25:10.280 --> 25:18.560] everyone I guess one more last thing before we go it's not a good idea that [25:18.560 --> 25:21.760] there's two like yeah I would just like to add one last thing if you want to [25:21.760 --> 25:27.600] check a demo and more troubleshooting scenarios we did a talk at self virtual [25:27.600 --> 25:33.800] summit 2022 it's already there on YouTube where we demoed couple troubleshooting [25:33.800 --> 25:37.120] scenarios and crew plugin I'll definitely share a reference and add a [25:37.120 --> 25:40.960] reference to here but that'll be a good to check out as well if you want to [25:40.960 --> 26:01.840] check it check out a live demo yeah thanks thanks any questions I was [26:01.840 --> 26:08.960] wondering a bit the downsides of using RUG with SAF because SAF is known to be [26:08.960 --> 26:13.020] really hard in configuring and getting the right performance to do some kind of [26:13.020 --> 26:22.600] granularity there. So if I've summarized correctly the question is what are like [26:22.600 --> 26:26.520] the downsides I would more or less maybe put it at the advantages disadvantages [26:26.520 --> 26:31.080] of using RUG to run SAF on Kubernetes especially with SAF being quite [26:31.080 --> 26:58.280] complex. If there's a loss of control on SAF side. Oh I see okay and attitude at [26:58.800 --> 27:04.920] if there's anything that's well you lose when you use RUG SAF. I guess as a [27:04.920 --> 27:12.760] major downside that most people see as well is because you have an additional [27:12.760 --> 27:21.240] layer with Kubernetes being that. I guess maybe to address that a bit more [27:21.240 --> 27:26.520] from what is at least I know there for example with SAF ADM I think SAF ADM is [27:26.520 --> 27:33.960] for RUG uses Docker to run containers basically as well right. So SAF ADM for [27:33.960 --> 27:38.240] example at least uses kind of also introduces in that layer so to say with [27:38.240 --> 27:49.000] Docker slash Potman well one that runs container insert here. It has more [27:49.000 --> 27:52.960] or less in regards to like installing SAF for example in my eyes but I'm like I'm [27:53.240 --> 27:59.680] very biased to containers obviously. It has this aspect of here's the SAF image [27:59.680 --> 28:04.280] and it should work unless you you know have something weird with the host OS [28:04.280 --> 28:13.160] going on. Downside is again like if Kubernetes just goes completely crazy [28:13.160 --> 28:18.000] the SAF class is probably also gonna have a bad time but that's kind of then [28:18.320 --> 28:23.120] like the weighing off do you are you confident enough I guess to well run [28:23.120 --> 28:29.000] Kubernetes and even running a Kubernetes cluster for long term it's like [28:29.000 --> 28:33.120] especially with Kubernetes there's more of this talk about was again Pets [28:33.120 --> 28:37.840] versus Cattle so instead of just you know having a cluster for every [28:37.840 --> 28:42.680] application or something even and just and oh we're done throwing it away versus [28:42.680 --> 28:47.160] for well obviously something that's persistent and important as a SAF cluster [28:47.160 --> 28:54.080] you can't just throw it away then. From experience so far I can tell that it is [28:54.080 --> 29:00.000] possible to run a Rook SAF cluster over multiple years I think I when did I start [29:00.000 --> 29:04.240] mine I think I had it running for two years and the only reason I shut it down [29:04.240 --> 29:08.960] was because I had gotten new hardware in another location I kind of just said [29:08.960 --> 29:13.680] I was like do I migrate it or do I not mine it was just okay let's just start [29:13.680 --> 29:17.240] from scratch but that's also because that's cluster I'm talking about there [29:17.240 --> 29:22.600] had like 50 other applications running where it's just like a okay let's start [29:22.600 --> 29:29.760] from scratch anyway so to say in regards to losing control it's not [29:29.760 --> 29:38.160] necessarily you don't really have like a like you don't have like a use this [29:38.240 --> 29:43.760] disk manual really way besides putting it in the yaml and fingers crossed the [29:43.760 --> 29:48.360] operator takes care of preparing and then deploying an OSD to that disk or [29:48.360 --> 29:55.120] even partition but it's like again I think with most tools there that take [29:55.120 --> 30:04.080] away certain aspects at least in regards to the installation or configuration so [30:04.320 --> 30:09.040] that those points are taken away but it lies in regards to configuring SAF or [30:09.040 --> 30:16.720] even certain aspects you can do everything as normal and at least from [30:16.720 --> 30:22.560] well from experience with SAF it I guess to put like that has gotten a lot [30:22.560 --> 30:30.840] better as with the now tell me the the now the config store the config store as [30:30.880 --> 30:36.520] it basically says you have like well config store in the monitors where you [30:36.520 --> 30:40.080] can just easily set for certain components instead of always having to [30:40.080 --> 30:44.240] manually make changes to any config files on the servers on your storage [30:44.240 --> 30:53.200] nodes themselves and it has gotten better that's awesome I think I would just [30:53.200 --> 30:58.840] like to say a lot of places it gives you a control as well right because I mean [30:59.000 --> 31:06.120] operator is responsible for reconciliation and taking charge when we I mean [31:06.120 --> 31:10.440] off automatic automated scenarios where we want recovery to happen right and [31:11.000 --> 31:15.400] SAF the goal is to improve recovery and in productions you don't need any [31:15.400 --> 31:20.800] unexpected loss of control as well right we would want to give admins a certain [31:20.800 --> 31:25.000] level of control we don't want them to go ahead and I mean play around with the [31:25.040 --> 31:31.240] ways these so I think I mean in in ways you in many of the production production [31:31.240 --> 31:34.920] scenarios you need a certain set of control as well which Rook actually [31:34.920 --> 31:42.640] provides so I mean at that point I would certainly recommend and consider it as [31:42.640 --> 31:44.680] an advantage as well [32:13.520 --> 32:19.200] um question is if there's going to be a performance hits in regards to running [32:19.200 --> 32:22.960] SAF in Kubernetes [32:25.440 --> 32:31.680] depends on how you run it if you run it like I'm personally preferring running [32:31.680 --> 32:35.800] the my Rook SAF clusters always with host network but you kind of can already [32:35.800 --> 32:40.880] depending on how far you're with container of companies it goes over well [32:41.040 --> 32:45.200] host network some like that some don't I personally do more or less just do it [32:45.200 --> 32:49.360] because I don't want the traffic to go over the overlay network as you have [32:49.360 --> 32:54.080] some plug-in some CNI container network interface when you want who wants to [32:54.080 --> 32:57.680] look into that that takes care of the network between between your notes so [32:57.680 --> 33:02.320] it more or less depends there's a lot of people using well just having Rook [33:02.320 --> 33:08.240] SAF talk over the overlay network as well it works fine as well it's just a [33:08.320 --> 33:11.760] preference I would really more or less put it at and depending on what your [33:11.760 --> 33:16.160] network looks like if you have 10g or something and your overlay network in [33:16.160 --> 33:20.320] the end maybe brings the like in an iPerf test at least brings it down to [33:20.320 --> 33:27.360] like nine point something you know like is it worth exposing that traffic to [33:27.360 --> 33:31.920] the host network then versus just having it go over the overlay network but [33:31.920 --> 33:36.000] again if you think about it just like another layer to consider if you want [33:36.160 --> 33:41.600] that if you don't and if you don't want that there's also options like Maltos to [33:42.960 --> 33:45.600] allow you some more fine-grained network connections or [33:47.360 --> 33:50.720] config in regards to the interfaces you want to pass in like different [33:50.720 --> 33:54.880] VLANs or something but that's like again it depends [33:56.880 --> 33:58.400] yeah [33:58.400 --> 34:02.720] can you still manage your SAF cluster via the SAF dashboard or is it another [34:02.720 --> 34:05.280] dashboard or do you need two dashboards or [34:06.560 --> 34:10.880] the question was if you can still use the SAF dashboard maybe just to expand on that [34:10.880 --> 34:17.520] SAF manager dashboard just to manage your SAF cluster to some degree [34:19.360 --> 34:21.520] there is currently not a functionality to [34:24.000 --> 34:28.160] add new OSDs I think if I remember correctly that's one thing as well with the future [34:28.160 --> 34:33.680] roadmap part with the more managerability where I also kind of looked at the dashboard and was [34:33.680 --> 34:40.960] like wait I have a grid but why oh why don't we but then it's the typical oh there's some road [34:40.960 --> 34:45.040] blocks that we just need to get out of the way and make sure that like we are all like especially [34:45.040 --> 34:49.600] with operator and even SAF ADM and others out there we're all aligned on the same way or if [34:49.600 --> 34:55.360] there's like a manager interface because there is even one and there I think if I understood [34:55.440 --> 34:59.760] you correctly or heard from the meetings correctly they're even looking into improving that interface [34:59.760 --> 35:07.120] further it will hopefully be easier thankfully also faster to have the dashboard as this point [35:07.120 --> 35:12.000] of contact as well yeah there's a lot of current work that there's a lot of work that is currently [35:12.000 --> 35:20.720] going on I'll just keep it I'll just say that there's a lot of work going on currently in the [35:20.720 --> 35:27.680] usability space from the recent discussions in upstream that we have had there's a to improve [35:27.680 --> 35:35.600] dashboard as well from both CUBE CTL from both Kubernetes and old standalone SAF perspective [35:35.600 --> 35:43.120] it's to make sure that I mean you can easily manage and monitor SAF in even in the CNCF world [35:43.120 --> 35:49.760] there has been recent discussions that have happened to improve it as well from Rook space [35:50.800 --> 35:57.120] so a lot of work is going on in the usability space but if you have any ideas it'll be most [35:57.120 --> 36:02.800] welcome and really it would be great to have I mean it's usability is one thing that really [36:02.800 --> 36:09.600] matters a lot right we I mean user experience is one thing that I mean we would certainly cater [36:09.600 --> 36:15.360] to improve in Rook I think we have time for one more question [36:15.680 --> 36:39.680] so the question is if [36:40.000 --> 36:46.800] could I maybe modify it a bit more into the direction of like how could can you run Rook I [36:46.800 --> 36:51.120] guess I think the place into that as well like you can run Rook SAF [36:53.760 --> 37:00.720] you can run Rook SAF in a way that you connect it to an existing SAF cluster that it doesn't even [37:00.720 --> 37:06.640] matter if it's a Rook SAF cluster just the SAF cluster works as well it mainly takes care of [37:06.640 --> 37:13.440] then just setting up the CSI driver then I know people use that to some degree as well if they [37:13.440 --> 37:18.640] have an existing or even an existing Rook SAF cluster that they want to share with the others [37:20.560 --> 37:26.240] there's also in this external mode the possibility of the Rook SAF operator to manage certain [37:26.240 --> 37:31.360] components so that for example if you want a file system you could run those MDS demons that you [37:31.360 --> 37:35.680] need for the file system in that cluster that your Kubernetes is running on then [37:37.280 --> 37:42.160] that works as well it's kind of like those two main external modes and obviously the case of [37:42.160 --> 37:48.480] running it in the same cluster that's kind of like this either you just share what you have or [37:48.480 --> 37:54.480] share and allow like SAF file system or SAF objects or you can just run the demons on in the cluster [37:55.200 --> 38:00.400] in the same cluster then both works for all the operator does that answer that [38:03.200 --> 38:10.000] then any other question there are no questions there are a bunch of stickers here for everyone [38:10.000 --> 38:16.240] yeah stickers and if you've asked the question just now just come see me you've got a t-shirt too [38:16.800 --> 38:19.120] and maybe there's some left over after that