[00:00.000 --> 00:21.680]  Hi, everyone. I hope you're doing well and not feeling
[00:21.680 --> 00:28.280]  enough sleepy after the lunch hours. We are here to talk about introduction to Sef on
[00:28.280 --> 00:35.120]  Kubernetes using Rook. Here's Alexander. Alexander will introduce himself. I'm Gaurav,
[00:35.120 --> 00:39.920]  Cloud Storage Engineer at Coore Technologies and I'm also a community
[00:39.920 --> 00:44.480]  ambassador for the Sef project from Indian region and been working with the
[00:44.480 --> 00:49.000]  Sef and Rook project since a long time and now contributor to the Rook project.
[00:49.000 --> 00:57.440]  I'm Alexander Trost, funding engineer of Coore Technology Sync. I'm a
[00:57.440 --> 01:03.480]  maintainer of the Rook project as well and yeah we wanted to talk about Rook for
[01:03.480 --> 01:08.800]  everyone who doesn't know it. I want to get you started with storage who doesn't
[01:08.800 --> 01:13.840]  need fast reliable storage nowadays with the cloud native applications. We're
[01:13.840 --> 01:20.280]  obviously talking about a bit more well more performant storage I guess depending
[01:20.280 --> 01:32.840]  on who you're asking. Well the point of Rook in end is that with Kubernetes being
[01:32.840 --> 01:39.840]  kind of like with the shipping container ship here you have your Kubernetes
[01:39.840 --> 01:46.160]  that kind of abstracts everything tries to well provide you this one API for most
[01:46.200 --> 01:53.920]  to all things depending on how far you want to go with it and I guess for most
[01:53.920 --> 01:56.320]  people running Kubernetes it kind of looks like that if you have your big
[01:56.320 --> 01:59.080]  giant ship running your production applications and you have your
[01:59.080 --> 02:08.720]  automation and CI CD process that kind of just try to keep it running and that's
[02:08.720 --> 02:12.520]  where the question with storage more and more comes into frame for people
[02:12.560 --> 02:19.000]  especially with local storage already like I think a year or so a year or two
[02:19.000 --> 02:26.200]  even ago coming into let's say being better supported in Kubernetes in a
[02:26.200 --> 02:31.040]  native way and not just having things around Kubernetes to try to make that
[02:31.040 --> 02:38.200]  an easier endeavor. We have the question of how can I for example get my self
[02:38.200 --> 02:44.880]  storage talking with Kubernetes so that I have storage for my applications and
[02:44.880 --> 02:52.000]  well that's the simple interface it's more or less great that it's nowadays
[02:52.000 --> 02:58.000]  mainly one interface there called CSI container storage interface which for
[02:58.000 --> 03:01.600]  well for storage vendors basically means they only need to implement one
[03:01.600 --> 03:07.920]  storage they only need to implement one interface and as well for
[03:07.960 --> 03:12.440]  Kubernetes slash you as a user you have one interface or like one way on how you
[03:12.440 --> 03:17.360]  can get storage for example if you want storage on Kubernetes you have the way
[03:17.360 --> 03:21.840]  of using the persistent volume claims we basically from our application
[03:21.840 --> 03:28.440]  perspective claim storage and Kubernetes will take care of for example talking
[03:28.440 --> 03:35.840]  with self storage and provisioning the volume and subsequently the CSI driver
[03:35.840 --> 03:40.480]  from Seth will take care of the whole mapping the volume mounting the volume
[03:40.480 --> 03:43.840]  so that is completely transparent to your application and the whole thing is
[03:43.840 --> 03:49.360]  with the CSI interface there it's like this one way for any storage vendor to
[03:49.360 --> 03:54.200]  also get their storage running like there's well there's obviously more than
[03:54.200 --> 04:00.160]  Seth but well obviously with Rook Seth we're going to focus on Seth here and
[04:00.160 --> 04:04.160]  that's exactly kind of like a connector that in between there so if you want
[04:04.160 --> 04:10.520]  storage doesn't really matter if it's just Seth the point over obviously Seth
[04:10.520 --> 04:15.200]  is that we have the Seth CSI that's disconnecting bits from Kubernetes from
[04:15.200 --> 04:20.480]  the applications container side for your storage and that's already a point
[04:20.480 --> 04:26.640]  where kind of Rook we're starting to talk about Rook here is that you can run
[04:26.640 --> 04:33.680]  your Seth's storage cluster well on most to any hardware I don't know what we
[04:33.720 --> 04:39.400]  could run it on a Raspberry Pi as well all right yes easily um you can well I
[04:39.400 --> 04:44.360]  think I've heard people run it on some Android phones or something even as well
[04:44.360 --> 04:49.440]  but it's like the well you know just because you can doesn't necessarily mean
[04:49.440 --> 04:52.560]  you should but that's a whole nother discussion the point being you can
[04:52.560 --> 04:56.640]  technically have yourself storage anywhere so it doesn't really matter if
[04:56.640 --> 05:04.440]  it's well if it's on the metal in your own data center or if it's just a few
[05:04.440 --> 05:08.320]  laptops thrown together doesn't that's the thing with Seth in general there it's
[05:08.320 --> 05:13.000]  like you don't need the best hardware like you don't need to buy that big box
[05:13.000 --> 05:16.920]  from the one storage hardware vendor maybe to explicitly go into that
[05:16.920 --> 05:23.840]  direction to have storage and that's the thing where kind of the combination of
[05:24.160 --> 05:32.240]  using Kubernetes and Seth might come into play or just for having storage for
[05:32.240 --> 05:44.360]  your applications but also as a point for how should I put it for running Seth
[05:44.360 --> 05:51.680]  that's what Rook is about it's about running Seth obviously the connecting
[05:51.680 --> 05:56.920]  part setting that con connection up between Seth and Kubernetes as well but
[05:56.920 --> 06:03.520]  the idea is that Rook runs Seth in Kubernetes in containers kind of I
[06:03.520 --> 06:07.600]  think I think I mainly saw it from Seth ADM last time we deployed it cluster on
[06:07.600 --> 06:12.440]  bare metal directly that like Seth ADM also one other way maybe to put like that
[06:12.440 --> 06:18.720]  to do install deploy even configure easily manage it's easily managed
[06:19.480 --> 06:23.720]  but it's one way to just install run it it's kind of the same point of for like
[06:23.720 --> 06:30.360]  Rook where Rook is basically a Seth operator for Kubernetes I'm going to go
[06:30.360 --> 06:33.920]  into a little bit more what an operator does because that's like that's one of
[06:33.920 --> 06:38.880]  the vital points in general just from well running certain applications on
[06:38.880 --> 06:45.000]  Kubernetes and just again as we had it like running Seth on Kubernetes we can
[06:45.000 --> 06:52.440]  with the operator pattern that we have in Kubernetes we can easily have most
[06:52.440 --> 06:58.520]  things that cause quite some pain depending on how big you scale your
[06:58.520 --> 07:02.600]  storage cluster as well obviously deployment bootstrap and configuration
[07:02.600 --> 07:08.560]  upgrades and everything like that's all processes that I think there's probably
[07:08.560 --> 07:12.560]  five million ansible playboots to install Seth there's well obviously Seth
[07:12.600 --> 07:18.920]  ADM there's what was called the deploy was there early as well which is Seth
[07:18.920 --> 07:24.720]  ADM I think now it is right Seth deploy is earlier Seth ADM is now more of a
[07:24.720 --> 07:32.400]  no advanced and I mean latest tool that everyone is using these days and there's
[07:32.400 --> 07:36.080]  more like I can already just think about five more tools on like how to deploy
[07:36.080 --> 07:43.560]  Seth and ironically for the people that have looked into Kubernetes a bit more
[07:43.560 --> 07:51.440]  already it's kind of the same story for deploying Kubernetes but because of
[07:51.440 --> 07:54.640]  Kubernetes being kind of like this abstraction layer on top of hardware to
[07:54.640 --> 08:04.720]  some degree abstracting everything away but very quick skip this is that
[08:04.720 --> 08:11.040]  patch it allows the Rook operator that's exactly where this image comes in
[08:11.040 --> 08:15.800]  it's orchestrating a cluster it's not just a well deployment office here as
[08:15.800 --> 08:22.120]  well but it's about using the Kubernetes APIs to easily just take care of
[08:22.120 --> 08:27.080]  everything so to say you want add a new a new note into your storage cluster
[08:27.080 --> 08:33.320]  what do you do technically speaking you just add it to Kubernetes and well if
[08:33.320 --> 08:37.680]  everything goes well ten seconds later the operator will be like oh new note
[08:37.680 --> 08:42.680]  gotta do my job run the preparing job and everything get the note ready and a
[08:42.680 --> 08:48.400]  few seconds even later from that the new self components the OSDs on the disk
[08:48.400 --> 08:53.760]  depending what disks are obviously as well are taken care from that's kind of
[08:53.760 --> 08:58.400]  to make this full circle there with Kubernetes side is like what the
[08:58.440 --> 09:05.160]  operator flow kind of pattern it's mostly called is about it's about observing
[09:05.160 --> 09:10.480]  the operator is observing a status or even in Kubernetes case custom resources
[09:10.480 --> 09:17.800]  these are just think about it as like YAML let's just give it a dead what it is
[09:17.800 --> 09:23.840]  an object of YAML in Kubernetes which the operator can well watch on I as a
[09:23.840 --> 09:29.360]  user either make a change or even like my automatic CI CD process makes a
[09:29.360 --> 09:33.400]  change to it like oh a new note has been added or something or I want to tweak
[09:33.400 --> 09:47.560]  something in the configuration of the cluster and so the operator is observing
[09:47.560 --> 09:53.800]  that and when there's a change or when there's even in like the Kubernetes
[09:53.800 --> 09:58.240]  cluster there's a change like a node missing or something it analyzes that
[09:58.240 --> 10:03.600]  change for example if a node is gone or is just not ready in Kubernetes terms
[10:03.600 --> 10:08.920]  anymore let's say network outage for like two of your nodes the operator would
[10:08.920 --> 10:15.360]  analyze well observe it first of all analyze that and start acting upon that
[10:15.400 --> 10:20.520]  for example in Kubernetes terms it would take care of setting certain so-called
[10:20.520 --> 10:25.640]  just to have that term out there portrait disruption budgets which kind of
[10:25.640 --> 10:31.400]  tried to prevent other nodes from potentially stopping the components of
[10:31.400 --> 10:36.400]  the SAF storage cluster main point is really just that it's this like observe
[10:36.400 --> 10:41.080]  analyze act kind of loop because in the end it just repeats itself all over again
[10:41.160 --> 10:46.960]  it's a whole deal with Kubernetes operators it's it's again if like I
[10:46.960 --> 10:54.280]  want to for I guess the people more already into the SAF if you want to
[10:54.280 --> 11:00.480]  scale up some more SAF monitors or well soft months you just edit the object in
[11:00.480 --> 11:05.080]  Kubernetes in the API and just crank the number from one count from three to five
[11:05.080 --> 11:09.520]  or something and again this changes detected by the operator analyzes it
[11:09.640 --> 11:16.680]  and acts upon it and that makes it quite convenient as well again here over
[11:16.680 --> 11:20.680]  perfect with the YAML sorry it's a I've just this guy and come a little bit of
[11:20.680 --> 11:26.120]  clarification I don't have it mirrored on my screen so it's a bit hard but
[11:26.120 --> 11:30.760]  that's exactly the YAML that we talked about like as an example I have my
[11:30.760 --> 11:38.200]  cluster running and let's say new SAF release for what I would need to do to
[11:38.240 --> 11:43.520]  upgrade my cluster I would basically go ahead and just change the image to be
[11:43.520 --> 11:51.160]  well not 1723 let's say as an example yeah 1725 as an example and again
[11:51.160 --> 11:55.760]  operator would detect that analyze if every component is up to date or not and
[11:55.760 --> 12:01.880]  then even start well I don't want to say complicated upgrade process but there's
[12:01.880 --> 12:05.880]  especially with something a SAF there's more than just I'll let me just restart
[12:05.880 --> 12:09.800]  it there's checks before every component is restarted through SAF native
[12:09.800 --> 12:15.120]  ways of like it's basically commands that are well okay to stop they're
[12:15.120 --> 12:19.160]  basically called like that and that's the whole idea there that the operator
[12:19.160 --> 12:23.040]  helps you with that and in the end just fully takes care of it so that in the end
[12:23.040 --> 12:28.440]  for the main part of your work you can just sit back change it in the YAML in a
[12:28.440 --> 12:35.400]  few minutes or it can't even be ours depending on how big the cluster is
[12:35.400 --> 12:41.640]  the operator will take care of that as I mentioned before like the
[12:41.640 --> 12:46.120]  example of the monitor count for example we want to change that change it a few
[12:46.120 --> 12:49.600]  seconds later the operator will pick that up and start making the changes
[12:49.600 --> 12:54.960]  necessary or even if you would want to scale it down from like 5 to 3 or 3 to
[12:54.960 --> 13:01.800]  1 which not recommended we need highly availability there or another option the
[13:01.800 --> 13:06.960]  operator again what does it takes care of doing it or even if you want to
[13:06.960 --> 13:12.480]  specifically say on this one note please use this one device or even for this
[13:12.480 --> 13:19.600]  then disk or NVMe for example use more than one storage team OSD team for that
[13:19.600 --> 13:27.520]  these things are possible and quite easily just by writing some lines of
[13:27.520 --> 13:31.720]  YAML in the end according to your workload you can easily just customize
[13:31.720 --> 13:35.440]  your according to your workload you can easily customize your YAMLs that will
[13:35.440 --> 13:43.240]  make your life easier and we've mainly talked about having like the cluster
[13:43.240 --> 13:46.960]  running or even setting up the cluster with the YAML definition of a self
[13:46.960 --> 13:54.240]  cluster object but if you would for example want to well run some
[13:54.240 --> 13:59.720]  Prometheus in your Kubernetes cluster and need storage for them to be able to
[13:59.720 --> 14:03.720]  use storage in SAF you need to have a storage pool for example RBD storage
[14:03.720 --> 14:09.760]  block storage basically we also again just go ahead create a SAF block pool
[14:09.760 --> 14:15.840]  object which is simply containing the information of if we go from here failure
[14:15.840 --> 14:24.040]  domain where well you basically tell SAF to only store data on different hosts to
[14:24.040 --> 14:30.160]  keep it simple for now the replicated size that there will be a free total
[14:30.160 --> 14:35.920]  replica of three copies of data in your cluster that requires SAF replicas
[14:35.920 --> 14:41.240]  let's just skip the phone out it's like a SAF replica size and even that you
[14:41.240 --> 14:45.600]  could technically set the compression mode for this pool point is again we
[14:45.600 --> 14:51.400]  can just write this in YAML apply it against the Kubernetes API and a few
[14:51.440 --> 14:56.320]  seconds later also for like the other objects same way you need the SAF file
[14:56.320 --> 15:02.160]  system SAF object storage same way the operator takes care of creating all the
[15:02.160 --> 15:08.120]  components for example the MDS for a file system we have the well standard
[15:08.120 --> 15:14.560]  components like the manager the monitors operator the OSD's and even for the
[15:14.560 --> 15:19.040]  object store for example the RJW components and the operator simply takes
[15:19.080 --> 15:22.840]  care of that and again if you change the SAF version a few seconds to maybe a
[15:22.840 --> 15:26.520]  minute or two later depending on what the state of your clusters operator will
[15:26.520 --> 15:37.160]  take care of doing the update we've talked about we've talked about we've
[15:37.160 --> 15:44.760]  talked about deploying root SAF cluster mainly right now we want to highlight in
[15:44.800 --> 15:53.440]  that in that point as well the crew plug-in that root SAF is building and
[15:53.440 --> 16:01.760]  yeah providing it allows you to well have certain processes automated even
[16:01.760 --> 16:07.720]  certain disaster recovery cases are easier to handle with that and GERF will
[16:07.800 --> 16:23.720]  talk a bit about that so so what's crew right crew is basically a package
[16:23.720 --> 16:28.720]  manager for kubectl plugins you can I mean it makes the management of
[16:28.720 --> 16:33.800]  Kubernetes easier and that's how the core developers and maintainers came
[16:33.840 --> 16:39.880]  together and thought we can definitely write a plug-in to make the life of our
[16:39.880 --> 16:45.800]  developers and administrators more easier crew was the way to go since it's
[16:45.800 --> 16:52.880]  the de facto package manager for kubectl plugins so I mean you can just do a
[16:52.880 --> 16:59.600]  crew kubectl install kubectl crew install root SAF that's how the plug-in
[16:59.600 --> 17:05.240]  will be installed and just if you can see we just ran the help command it shows
[17:05.240 --> 17:11.440]  a bunch of things that you could do you can just run a whole bunch of SAF
[17:11.440 --> 17:16.400]  commands RBD commands right now also check the health of your cluster you
[17:16.400 --> 17:21.320]  could just do a bunch of things like even if you want to remove an OSD so the
[17:21.320 --> 17:27.960]  need actually arise because for example if you want to use underlying tools like
[17:27.960 --> 17:32.240]  SAF object store or something like that to debug core troubleshooting issues and
[17:32.240 --> 17:37.840]  core issues at core OSD level I mean crew plugin is definitely a great way to
[17:37.840 --> 17:43.440]  go as it provides common management and troubleshooting tools for SAF it's
[17:43.440 --> 17:50.920]  currently I mean a lot of things work will show you it's just like I mentioned
[17:50.920 --> 17:55.400]  you just need to run kubectl crew install root SAF it goes ahead quickly
[17:55.560 --> 18:01.760]  installs the plugin it's I mean way easier that you just don't need to earlier I
[18:01.760 --> 18:06.200]  mean you need had to go inside the toolbox pod to debug and troubleshoot
[18:06.200 --> 18:12.000]  every issue with crew it provides such an ease of access that it makes I mean
[18:12.000 --> 18:17.800]  lives easier and troubleshooting makes is definitely easier you could just
[18:17.800 --> 18:23.120]  override the cluster configuration just at the runtime and some of the disaster
[18:23.160 --> 18:26.880]  recovery scenarios are also addressed some of the troubleshooting scenarios that
[18:26.880 --> 18:31.920]  were addressed is mon recovery suppose let's say you will have the default three
[18:31.920 --> 18:37.120]  mons in the cluster and majority of them lose code I mean recovering mons from
[18:37.120 --> 18:43.560]  mon maps I mean just doing bunch of tasks could be if not done carefully it
[18:43.560 --> 18:50.080]  could be it could lead to more disasters but certainly with more automation in
[18:50.080 --> 18:54.280]  place when things are definitely working this is also made easier with the
[18:54.280 --> 19:02.560]  crew plugin and even if you want to troubleshoot CSI issues it's it it makes
[19:02.560 --> 19:11.480]  it easier for sure so yeah I mean just like if you want to restore mons with
[19:11.480 --> 19:17.120]  OSDs and even if we just delete a custom delete the root SAF cluster after
[19:17.320 --> 19:23.960]  accidental deletion of custom resources that could be also restored and one of
[19:23.960 --> 19:28.160]  the common goals in the road map is also automating core dump collection
[19:28.160 --> 19:33.360]  because let's say if there's an issue that happens with the SAF demon and we
[19:33.360 --> 19:36.920]  want to collect the core dump of the process for further investigation to
[19:36.920 --> 19:43.280]  share it with the developers and with the community to understand what issues
[19:43.280 --> 19:47.740]  we are facing it can easily do well if you want to just do a performance
[19:47.740 --> 19:52.320]  profiling of a process with gdb that could be made easier as well so that
[19:52.320 --> 19:56.440]  these are some of the goals the current the current plug is in plug-in is
[19:56.440 --> 20:01.040]  written in bash but there's a work going on to rewrite the whole plug-in in
[20:01.040 --> 20:07.520]  Golang so that it's definitely more scalable and much more easier to manage
[20:08.120 --> 20:15.040]  and even for contributors so yeah just like that
[20:25.880 --> 20:36.760]  so I guess the point we're more or less just trying to make is that if you have
[20:36.800 --> 20:43.400]  Kubernetes or even run a distribution of well what is the Ranger open shift
[20:43.400 --> 20:50.960]  obviously as well on your hardware and I would even put it to some degree as like
[20:50.960 --> 20:59.200]  a you're confident enough with Kubernetes to run it you can have it quite
[20:59.200 --> 21:05.040]  easy running a SAF cluster as well on top of that obviously to some degree you
[21:05.040 --> 21:09.720]  need some SAF knowledge but that's with everything if you run it in if you want
[21:09.720 --> 21:16.000]  to run it in production it's just that with this abstraction layer again with
[21:16.000 --> 21:21.320]  Kubernetes it makes it easier for you it's more of like you kind of start in
[21:21.320 --> 21:27.440]  general there to think of more of like well I have some notes and they're
[21:27.440 --> 21:32.160]  simply well there to take care of the components that you need to run for the
[21:32.160 --> 21:38.360]  SAF cluster and especially with the root SAF operator obviously it makes the
[21:38.360 --> 21:45.720]  process easier by well a GitOps approach for example where you can just throw your
[21:45.720 --> 21:53.960]  YAMLs into well into Git most of the time and have that automatic mechanism
[21:53.960 --> 21:57.360]  basically to care of this deployment process so that again the operator just
[21:57.400 --> 22:03.600]  takes the YAML takes care of it and makes the changes necessary and with the
[22:03.600 --> 22:10.880]  RUG SAF crew plugin just so you get that summarized real quick again it's a way
[22:10.880 --> 22:17.520]  for yeah for us to have certain automatic processes in the hand of admins
[22:17.520 --> 22:23.200]  when they need to and not just as like a hey here's like a 100 line bash clip
[22:23.200 --> 22:28.760]  please run that one command at a time and it simply allows it again because we
[22:28.760 --> 22:32.680]  have this access to communities where we can just ask Kubernetes hey where's the
[22:32.680 --> 22:37.840]  monitor running oh it's on node A and all that because well we have an API that
[22:37.840 --> 22:42.880]  can tell us most of this information and also for recovery scenarios there we
[22:42.880 --> 22:48.760]  can just ask Kubernetes to run a new pod or to well have a new monitor for
[22:48.760 --> 22:52.360]  example then running with this old information from the other monitors to
[22:52.400 --> 22:59.760]  have this forum recovered that is required there and regarding RUG SAF is
[22:59.760 --> 23:07.160]  like a general outlook for the future some of the major points we're currently
[23:07.160 --> 23:11.800]  looking at is that we want to improve the cluster manageability even more than
[23:11.800 --> 23:16.120]  it we already have it at we'll make it easier we're using the RUG SAF plugin
[23:16.120 --> 23:21.200]  right now you still need to do quite a lot of manual YAML editing of the
[23:21.200 --> 23:27.320]  objects that we have in the in the in the API but we would like to have well
[23:27.320 --> 23:32.120]  some more crew plugin commands there again to extend that functionality make
[23:32.120 --> 23:38.120]  make it simply easier as well improved security by having the operator and
[23:38.120 --> 23:43.640]  other components that are running in the cluster use separate access credentials
[23:43.640 --> 23:48.080]  to the cluster just to have there a bit more well I guess to some degree even
[23:48.120 --> 23:54.280]  transparency if you would look at like audit logging of the SAF cluster and as
[23:54.280 --> 24:02.800]  well that's encryption support for SAFFS and OSDs on partitions and as with
[24:02.800 --> 24:07.080]  everything there's more feel free to check out the roadmap MD file on the
[24:07.080 --> 24:13.760]  GitHub GitHub.com slash RUG the link will be shown as well if you want to get
[24:13.760 --> 24:16.800]  involved if you want to contribute if you have questions or anything we have
[24:16.800 --> 24:21.000]  well obviously to GitHub there's even the GitHub discussions open if you have
[24:21.000 --> 24:30.000]  any well any more more questions I guess then that might not fit on Slack we
[24:30.000 --> 24:34.320]  well we have Twitter account obviously we also have community meetings if you
[24:34.320 --> 24:42.240]  have any more pressing concerns to talk about and well just kind of from that
[24:42.280 --> 24:47.080]  side as well where as Garth and I mentioned we're from code technology
[24:47.080 --> 24:51.480]  sync we're building a company that wants to create a product around RUG staff and
[24:51.480 --> 24:55.600]  just in general try to help the community out there if we do talk with us
[24:55.600 --> 25:02.320]  or contact us as well and for now thank you for listening and we'll gladly take
[25:02.320 --> 25:07.040]  some questions and can simply take the last I think you showed 50 minutes for
[25:07.040 --> 25:10.280]  questions or even just talking a bit about certain scenarios here with
[25:10.280 --> 25:18.560]  everyone I guess one more last thing before we go it's not a good idea that
[25:18.560 --> 25:21.760]  there's two like yeah I would just like to add one last thing if you want to
[25:21.760 --> 25:27.600]  check a demo and more troubleshooting scenarios we did a talk at self virtual
[25:27.600 --> 25:33.800]  summit 2022 it's already there on YouTube where we demoed couple troubleshooting
[25:33.800 --> 25:37.120]  scenarios and crew plugin I'll definitely share a reference and add a
[25:37.120 --> 25:40.960]  reference to here but that'll be a good to check out as well if you want to
[25:40.960 --> 26:01.840]  check it check out a live demo yeah thanks thanks any questions I was
[26:01.840 --> 26:08.960]  wondering a bit the downsides of using RUG with SAF because SAF is known to be
[26:08.960 --> 26:13.020]  really hard in configuring and getting the right performance to do some kind of
[26:13.020 --> 26:22.600]  granularity there. So if I've summarized correctly the question is what are like
[26:22.600 --> 26:26.520]  the downsides I would more or less maybe put it at the advantages disadvantages
[26:26.520 --> 26:31.080]  of using RUG to run SAF on Kubernetes especially with SAF being quite
[26:31.080 --> 26:58.280]  complex. If there's a loss of control on SAF side. Oh I see okay and attitude at
[26:58.800 --> 27:04.920]  if there's anything that's well you lose when you use RUG SAF. I guess as a
[27:04.920 --> 27:12.760]  major downside that most people see as well is because you have an additional
[27:12.760 --> 27:21.240]  layer with Kubernetes being that. I guess maybe to address that a bit more
[27:21.240 --> 27:26.520]  from what is at least I know there for example with SAF ADM I think SAF ADM is
[27:26.520 --> 27:33.960]  for RUG uses Docker to run containers basically as well right. So SAF ADM for
[27:33.960 --> 27:38.240]  example at least uses kind of also introduces in that layer so to say with
[27:38.240 --> 27:49.000]  Docker slash Potman well one that runs container insert here. It has more
[27:49.000 --> 27:52.960]  or less in regards to like installing SAF for example in my eyes but I'm like I'm
[27:53.240 --> 27:59.680]  very biased to containers obviously. It has this aspect of here's the SAF image
[27:59.680 --> 28:04.280]  and it should work unless you you know have something weird with the host OS
[28:04.280 --> 28:13.160]  going on. Downside is again like if Kubernetes just goes completely crazy
[28:13.160 --> 28:18.000]  the SAF class is probably also gonna have a bad time but that's kind of then
[28:18.320 --> 28:23.120]  like the weighing off do you are you confident enough I guess to well run
[28:23.120 --> 28:29.000]  Kubernetes and even running a Kubernetes cluster for long term it's like
[28:29.000 --> 28:33.120]  especially with Kubernetes there's more of this talk about was again Pets
[28:33.120 --> 28:37.840]  versus Cattle so instead of just you know having a cluster for every
[28:37.840 --> 28:42.680]  application or something even and just and oh we're done throwing it away versus
[28:42.680 --> 28:47.160]  for well obviously something that's persistent and important as a SAF cluster
[28:47.160 --> 28:54.080]  you can't just throw it away then. From experience so far I can tell that it is
[28:54.080 --> 29:00.000]  possible to run a Rook SAF cluster over multiple years I think I when did I start
[29:00.000 --> 29:04.240]  mine I think I had it running for two years and the only reason I shut it down
[29:04.240 --> 29:08.960]  was because I had gotten new hardware in another location I kind of just said
[29:08.960 --> 29:13.680]  I was like do I migrate it or do I not mine it was just okay let's just start
[29:13.680 --> 29:17.240]  from scratch but that's also because that's cluster I'm talking about there
[29:17.240 --> 29:22.600]  had like 50 other applications running where it's just like a okay let's start
[29:22.600 --> 29:29.760]  from scratch anyway so to say in regards to losing control it's not
[29:29.760 --> 29:38.160]  necessarily you don't really have like a like you don't have like a use this
[29:38.240 --> 29:43.760]  disk manual really way besides putting it in the yaml and fingers crossed the
[29:43.760 --> 29:48.360]  operator takes care of preparing and then deploying an OSD to that disk or
[29:48.360 --> 29:55.120]  even partition but it's like again I think with most tools there that take
[29:55.120 --> 30:04.080]  away certain aspects at least in regards to the installation or configuration so
[30:04.320 --> 30:09.040]  that those points are taken away but it lies in regards to configuring SAF or
[30:09.040 --> 30:16.720]  even certain aspects you can do everything as normal and at least from
[30:16.720 --> 30:22.560]  well from experience with SAF it I guess to put like that has gotten a lot
[30:22.560 --> 30:30.840]  better as with the now tell me the the now the config store the config store as
[30:30.880 --> 30:36.520]  it basically says you have like well config store in the monitors where you
[30:36.520 --> 30:40.080]  can just easily set for certain components instead of always having to
[30:40.080 --> 30:44.240]  manually make changes to any config files on the servers on your storage
[30:44.240 --> 30:53.200]  nodes themselves and it has gotten better that's awesome I think I would just
[30:53.200 --> 30:58.840]  like to say a lot of places it gives you a control as well right because I mean
[30:59.000 --> 31:06.120]  operator is responsible for reconciliation and taking charge when we I mean
[31:06.120 --> 31:10.440]  off automatic automated scenarios where we want recovery to happen right and
[31:11.000 --> 31:15.400]  SAF the goal is to improve recovery and in productions you don't need any
[31:15.400 --> 31:20.800]  unexpected loss of control as well right we would want to give admins a certain
[31:20.800 --> 31:25.000]  level of control we don't want them to go ahead and I mean play around with the
[31:25.040 --> 31:31.240]  ways these so I think I mean in in ways you in many of the production production
[31:31.240 --> 31:34.920]  scenarios you need a certain set of control as well which Rook actually
[31:34.920 --> 31:42.640]  provides so I mean at that point I would certainly recommend and consider it as
[31:42.640 --> 31:44.680]  an advantage as well
[32:13.520 --> 32:19.200]  um question is if there's going to be a performance hits in regards to running
[32:19.200 --> 32:22.960]  SAF in Kubernetes
[32:25.440 --> 32:31.680]  depends on how you run it if you run it like I'm personally preferring running
[32:31.680 --> 32:35.800]  the my Rook SAF clusters always with host network but you kind of can already
[32:35.800 --> 32:40.880]  depending on how far you're with container of companies it goes over well
[32:41.040 --> 32:45.200]  host network some like that some don't I personally do more or less just do it
[32:45.200 --> 32:49.360]  because I don't want the traffic to go over the overlay network as you have
[32:49.360 --> 32:54.080]  some plug-in some CNI container network interface when you want who wants to
[32:54.080 --> 32:57.680]  look into that that takes care of the network between between your notes so
[32:57.680 --> 33:02.320]  it more or less depends there's a lot of people using well just having Rook
[33:02.320 --> 33:08.240]  SAF talk over the overlay network as well it works fine as well it's just a
[33:08.320 --> 33:11.760]  preference I would really more or less put it at and depending on what your
[33:11.760 --> 33:16.160]  network looks like if you have 10g or something and your overlay network in
[33:16.160 --> 33:20.320]  the end maybe brings the like in an iPerf test at least brings it down to
[33:20.320 --> 33:27.360]  like nine point something you know like is it worth exposing that traffic to
[33:27.360 --> 33:31.920]  the host network then versus just having it go over the overlay network but
[33:31.920 --> 33:36.000]  again if you think about it just like another layer to consider if you want
[33:36.160 --> 33:41.600]  that if you don't and if you don't want that there's also options like Maltos to
[33:42.960 --> 33:45.600]  allow you some more fine-grained network connections or
[33:47.360 --> 33:50.720]  config in regards to the interfaces you want to pass in like different
[33:50.720 --> 33:54.880]  VLANs or something but that's like again it depends
[33:56.880 --> 33:58.400]  yeah
[33:58.400 --> 34:02.720]  can you still manage your SAF cluster via the SAF dashboard or is it another
[34:02.720 --> 34:05.280]  dashboard or do you need two dashboards or
[34:06.560 --> 34:10.880]  the question was if you can still use the SAF dashboard maybe just to expand on that
[34:10.880 --> 34:17.520]  SAF manager dashboard just to manage your SAF cluster to some degree
[34:19.360 --> 34:21.520]  there is currently not a functionality to
[34:24.000 --> 34:28.160]  add new OSDs I think if I remember correctly that's one thing as well with the future
[34:28.160 --> 34:33.680]  roadmap part with the more managerability where I also kind of looked at the dashboard and was
[34:33.680 --> 34:40.960]  like wait I have a grid but why oh why don't we but then it's the typical oh there's some road
[34:40.960 --> 34:45.040]  blocks that we just need to get out of the way and make sure that like we are all like especially
[34:45.040 --> 34:49.600]  with operator and even SAF ADM and others out there we're all aligned on the same way or if
[34:49.600 --> 34:55.360]  there's like a manager interface because there is even one and there I think if I understood
[34:55.440 --> 34:59.760]  you correctly or heard from the meetings correctly they're even looking into improving that interface
[34:59.760 --> 35:07.120]  further it will hopefully be easier thankfully also faster to have the dashboard as this point
[35:07.120 --> 35:12.000]  of contact as well yeah there's a lot of current work that there's a lot of work that is currently
[35:12.000 --> 35:20.720]  going on I'll just keep it I'll just say that there's a lot of work going on currently in the
[35:20.720 --> 35:27.680]  usability space from the recent discussions in upstream that we have had there's a to improve
[35:27.680 --> 35:35.600]  dashboard as well from both CUBE CTL from both Kubernetes and old standalone SAF perspective
[35:35.600 --> 35:43.120]  it's to make sure that I mean you can easily manage and monitor SAF in even in the CNCF world
[35:43.120 --> 35:49.760]  there has been recent discussions that have happened to improve it as well from Rook space
[35:50.800 --> 35:57.120]  so a lot of work is going on in the usability space but if you have any ideas it'll be most
[35:57.120 --> 36:02.800]  welcome and really it would be great to have I mean it's usability is one thing that really
[36:02.800 --> 36:09.600]  matters a lot right we I mean user experience is one thing that I mean we would certainly cater
[36:09.600 --> 36:15.360]  to improve in Rook I think we have time for one more question
[36:15.680 --> 36:39.680]  so the question is if
[36:40.000 --> 36:46.800]  could I maybe modify it a bit more into the direction of like how could can you run Rook I
[36:46.800 --> 36:51.120]  guess I think the place into that as well like you can run Rook SAF
[36:53.760 --> 37:00.720]  you can run Rook SAF in a way that you connect it to an existing SAF cluster that it doesn't even
[37:00.720 --> 37:06.640]  matter if it's a Rook SAF cluster just the SAF cluster works as well it mainly takes care of
[37:06.640 --> 37:13.440]  then just setting up the CSI driver then I know people use that to some degree as well if they
[37:13.440 --> 37:18.640]  have an existing or even an existing Rook SAF cluster that they want to share with the others
[37:20.560 --> 37:26.240]  there's also in this external mode the possibility of the Rook SAF operator to manage certain
[37:26.240 --> 37:31.360]  components so that for example if you want a file system you could run those MDS demons that you
[37:31.360 --> 37:35.680]  need for the file system in that cluster that your Kubernetes is running on then
[37:37.280 --> 37:42.160]  that works as well it's kind of like those two main external modes and obviously the case of
[37:42.160 --> 37:48.480]  running it in the same cluster that's kind of like this either you just share what you have or
[37:48.480 --> 37:54.480]  share and allow like SAF file system or SAF objects or you can just run the demons on in the cluster
[37:55.200 --> 38:00.400]  in the same cluster then both works for all the operator does that answer that
[38:03.200 --> 38:10.000]  then any other question there are no questions there are a bunch of stickers here for everyone
[38:10.000 --> 38:16.240]  yeah stickers and if you've asked the question just now just come see me you've got a t-shirt too
[38:16.800 --> 38:19.120]  and maybe there's some left over after that