[00:00.000 --> 00:08.800]  So, Fabio, the stage is yours.
[00:08.800 --> 00:11.760]  I'm really looking forward to hearing about on-premise data centers.
[00:11.760 --> 00:14.160]  Do not need to be a legacy.
[00:14.160 --> 00:17.120]  Thank you.
[00:17.120 --> 00:19.680]  So hello, everyone.
[00:19.680 --> 00:25.560]  And just to be clear, this is going to be the topic that we are going to cover a little
[00:25.560 --> 00:32.400]  bit of history, some less alert, and then some technology bets that I think that would make
[00:32.400 --> 00:35.040]  sense in such conversation.
[00:35.040 --> 00:40.360]  So about me, I have been a Linux user for 20-ish years.
[00:40.360 --> 00:46.120]  I've been working with Linux for close to 20 years now, and I currently work with Reddit
[00:46.120 --> 00:51.320]  and do basically similar kind of conversation in my day-to-day job.
[00:51.320 --> 00:55.520]  So let's start with a little bit of history.
[00:55.520 --> 00:56.520]  Of the cloud.
[00:56.520 --> 00:57.520]  Let's call it this way.
[00:57.520 --> 01:04.360]  So Rackspace was founded in 1998 and I think was the first company that defined itself
[01:04.360 --> 01:07.200]  as cloud.
[01:07.200 --> 01:10.160]  In 2005, SoftLayer was founded.
[01:10.160 --> 01:13.840]  They defined themselves as bare-meta-cloud.
[01:13.840 --> 01:23.080]  And then in 2006, we have the S3 launched by AWS, which was the first service of AWS.
[01:23.080 --> 01:28.960]  2006 again, EC2, sorry, yeah.
[01:28.960 --> 01:37.480]  And then Google App Engine arrived, IBM bought SoftLayer, creating an IBM cloud now called
[01:37.480 --> 01:43.360]  and by 2021, AWS has more than 200 different services.
[01:43.360 --> 01:53.720]  So what about the history of the known cloud, because what we have seen are all cloud environments,
[01:53.720 --> 01:56.920]  but those are nothing new if you think about this.
[01:56.920 --> 02:05.520]  So in 1964, which is probably older than anyone or most of the people in this room,
[02:05.520 --> 02:17.400]  IBM introduced the CP40 and this machine had time-sharing technology, which was very different
[02:17.400 --> 02:26.320]  from what we call today cloud, but still it was probably the initial point of the history
[02:26.320 --> 02:35.760]  of the cloud and in the late 60s, IBM released Simon, which is an hypervisor.
[02:35.760 --> 02:43.400]  By 74, the two definitions, the two kinds of hypervisor get defined as tape 1, the bare-meta
[02:43.400 --> 02:47.440]  virtualization, tape 2, the hosted virtualization.
[02:47.440 --> 02:55.880]  And by 1998, VMware got founded and in 2000s, majority of companies moved from bare-meta
[02:55.880 --> 02:58.080]  to VMware's VMs.
[02:58.080 --> 03:06.560]  2001, ESX got released, which was type 1 kind of virtualization.
[03:06.560 --> 03:13.280]  2003, we have the first type 1 open source virtualization, Zen.
[03:13.280 --> 03:21.080]  And still 2003, VMware introduces Vmotion, which allows you to basically move a machine
[03:21.080 --> 03:25.840]  from one host to the other without rebooting it.
[03:25.840 --> 03:32.080]  In 2008, Microsoft arrived with Hyper-V, it previously had some other kind of virtualization
[03:32.080 --> 03:35.440]  tool, but Hyper-V got launched in 2008.
[03:35.440 --> 03:37.360]  So what is the cloud?
[03:37.360 --> 03:42.720]  Why we are distinguished the first group and the second one?
[03:42.720 --> 03:48.120]  Wikipedia says that cloud computing is the on-demand availability of computing system
[03:48.120 --> 03:53.200]  resources, especially data storage and computing power without direct active management by
[03:53.200 --> 03:54.200]  the user.
[03:54.200 --> 03:58.400]  So I think this is a good definition.
[03:58.400 --> 04:05.200]  I think that a better definition is a business model where one party rents to a second party
[04:05.200 --> 04:11.560]  computer system resources, especially data storage, cloud storage and computing power
[04:11.560 --> 04:15.520]  with the smallest granularity possible.
[04:15.520 --> 04:21.000]  And my point is cloud is not technical, is a business model.
[04:21.000 --> 04:29.200]  And if you think about, we move from renting machines like VPS on a monthly basis, and
[04:29.200 --> 04:36.880]  then AWS introduced the concept of ECU that was initially on an hourly based and then
[04:36.880 --> 04:38.480]  minute and then second.
[04:38.480 --> 04:44.800]  And now you can buy lambdas or similar kind of things for milliseconds.
[04:44.800 --> 04:48.560]  And in a way, also CPUs had the same shrinkage.
[04:48.560 --> 04:58.840]  So we move from food CPUs or sockets to vCPUs, which basically is hyper-threaded threads
[04:58.840 --> 05:03.680]  to fractional vCPUs with lambdas or similar services.
[05:03.680 --> 05:10.240]  So my point being, the whole thing about cloud is not technical, is only about the business
[05:10.240 --> 05:12.120]  side of it.
[05:12.120 --> 05:18.360]  So what can we learn from not only the last 20 years of what we can define as cloud, but
[05:18.360 --> 05:26.560]  also the previous 50 of what we can define as non-cloud, and more specifically, because
[05:26.560 --> 05:30.280]  we have seen that the cloud model actually works.
[05:30.280 --> 05:37.880]  The non-cloud model was not very functional to the business, to the point that very often
[05:37.880 --> 05:46.320]  those data centers got outsourced or in some different ways moved to the cloud in the sense
[05:46.320 --> 05:55.840]  that moved to someone else, and the business started to expand constantly those machines
[05:55.840 --> 06:02.000]  due to the basically OPEX model instead of the COPEX model.
[06:02.000 --> 06:07.800]  So there is one big aspect that we need to remember about this, which is the separation
[06:07.800 --> 06:08.800]  of concerns.
[06:08.800 --> 06:13.200]  So standardize the interface between the infrastructure and the workload.
[06:13.200 --> 06:20.240]  If you go in legacy data centers, very often you have 1,000 different kinds of systems
[06:20.240 --> 06:24.920]  that the infra people have to provide to the workload people.
[06:24.920 --> 06:30.360]  And this is because, oh, my system is different, my software is different, whatever, in the
[06:30.360 --> 06:37.680]  end of the day, that is a huge load for the infrastructure part of the business.
[06:37.680 --> 06:44.840]  Second, the scalability needs to be at workload level, so the infrastructure also needs to
[06:44.840 --> 06:53.400]  be somehow reliable and within some SLAs, but if the system has to stay up, if the application
[06:53.400 --> 06:57.240]  has to stay up, the application will have to take care about this.
[06:57.240 --> 07:04.880]  And third, workload have an abstract concept of whatever is underneath it, so the physical
[07:04.880 --> 07:05.880]  architecture.
[07:05.880 --> 07:12.000]  They don't need to know which data center they are in or in which rack, what is the
[07:12.000 --> 07:15.640]  nearby server, and so on.
[07:15.640 --> 07:26.440]  The function, so we also need a functional business model for a good managed IT system.
[07:26.440 --> 07:34.880]  And the first part is, as before, standardize the interface between the workload and infrastructure
[07:34.880 --> 07:39.480]  so that it's easily countable and priceable.
[07:39.480 --> 07:45.320]  Second, build back the infrastructure cost to the workload owners.
[07:45.320 --> 07:50.600]  We have seen, at least in my definition of cloud, that we still have two parties, one
[07:50.600 --> 07:55.920]  that delivers a service and the other one that consumes it and pays for it.
[07:55.920 --> 08:03.240]  So it's very important to create this also internally in companies or organizations of
[08:03.240 --> 08:10.880]  any kind, because this allows the infrastructure side of the business to justify their expenses
[08:10.880 --> 08:21.880]  over some kind of at least recognition of revenue or whatever, cost recovery, whatever.
[08:21.880 --> 08:24.360]  And third, keep the cost down.
[08:24.360 --> 08:31.280]  This is a key point, AWS, Google, those companies will do everything they can to keep the cost
[08:31.280 --> 08:36.240]  down because they need to be positive, cash flow positive.
[08:36.240 --> 08:42.480]  Obviously, if you are a department in a company, it's slightly different, but it's very important
[08:42.480 --> 08:48.120]  to still be cash flow positive because this will guarantee you that you will not have
[08:48.120 --> 08:55.120]  any issues over time with this part of the financial model.
[08:55.120 --> 08:57.520]  And third, maintain control.
[08:57.520 --> 09:03.840]  We have seen the clouds are obsessed about maintaining control and obtaining even more
[09:03.840 --> 09:10.160]  control on their hardware, their system, whatever, and this is very important for your own cloud
[09:10.160 --> 09:14.800]  if you want to be able to maintain it for 10, 20, 50 years.
[09:14.800 --> 09:22.480]  So the first one is, I would say, do not use, but be very cautious on using third-party
[09:22.480 --> 09:28.200]  property software, those companies can go away, can change pricing model, can do whatever,
[09:28.200 --> 09:29.840]  be aware of this.
[09:29.840 --> 09:37.960]  Second, evaluate very strongly the buy versus build decision because when you buy, obviously
[09:37.960 --> 09:43.120]  it's here now, but you don't have the know-how about this.
[09:43.120 --> 09:50.600]  So probably you will want to build a lot of your systems, not the core parts, but maybe
[09:50.600 --> 09:56.040]  the dashboard layer or that kind of thing so that you can effectively manage it however
[09:56.040 --> 09:58.560]  you think better.
[09:58.560 --> 10:03.480]  And third, be very aware of lockings because those will bite you over the course of the
[10:03.480 --> 10:05.120]  years.
[10:05.120 --> 10:12.640]  So how do I define the locking?
[10:12.640 --> 10:18.320]  I define it as the product between the probability that a component will require substitution
[10:18.320 --> 10:23.320]  during the solution lifetime and the total cost of the substitution.
[10:23.320 --> 10:30.400]  So for instance, Linux, if you base all your architecture on Linux, it's going to be very
[10:30.400 --> 10:36.480]  expensive to move out of Linux, but in the other hand, it's very improbable that you
[10:36.480 --> 10:44.720]  will need to do it because very probably in 10, 20 years Linux will be here.
[10:44.720 --> 10:52.280]  So a couple of points on technologies, the first one is keep the complexity of your system
[10:52.280 --> 10:54.920]  at the lowest level possible.
[10:54.920 --> 11:00.480]  Systems will get more complex and more absurd over time, so at least at the beginning start
[11:00.480 --> 11:03.280]  with the simple thing possible.
[11:03.280 --> 11:07.680]  Second, prefer build time complexity over run time complexity.
[11:07.680 --> 11:13.840]  It's way easier to automate a build thing than to automate something to be run.
[11:13.840 --> 11:20.560]  And also when something breaks, it's better if it's simple because it's easier to fix.
[11:20.560 --> 11:26.760]  If you have to compile your stuff, compile it, but try to keep the complexity at the
[11:26.760 --> 11:29.520]  run time at the minimum possible.
[11:29.520 --> 11:38.680]  Third, minimize the amount of services that you deliver to your business or your workload's
[11:38.680 --> 11:43.880]  owners so that effectively you can guarantee that those services are exactly what they
[11:43.880 --> 11:48.800]  require and you are able to deliver them in a sensible way.
[11:48.800 --> 11:53.840]  So I think that one big point is containers.
[11:53.840 --> 11:59.440]  Delivering a container-based solution at least, it's probably the best option I think
[11:59.440 --> 12:08.360]  today and user Kubernetes distribution, whatever you prefer and choose, that makes sense, it's
[12:08.360 --> 12:16.760]  fine and we'll see later the Kubernetes APIs are now fairly well-known, fairly abstract
[12:16.760 --> 12:23.040]  and fairly used so that those can be a good interface between the infra and the workload
[12:23.040 --> 12:24.040]  side.
[12:24.040 --> 12:29.640]  Also, you can do it yourself, community, call it whatever, a distribution.
[12:29.640 --> 12:34.000]  You can buy a commercial distribution of Kubernetes.
[12:34.000 --> 12:40.280]  If you do it, first, be sure that it's fully open source what you're buying so that you
[12:40.280 --> 12:46.360]  decrease your lock-in because you are decreasing the cost that it will take you to move from
[12:46.360 --> 12:48.560]  this to any other solution.
[12:48.560 --> 12:54.200]  Second, from a trust-worthy company, hopefully that company that you buy it from will not
[12:54.200 --> 12:59.080]  fail tomorrow because if it does, you will have bigger problem.
[12:59.080 --> 13:07.360]  Enter with a long track record of not screwing their customers because it's not good.
[13:07.360 --> 13:12.440]  And if they are heavily involved into the open source community, it's even better because
[13:12.440 --> 13:20.360]  that means that they are driving the development and they do have all the knowledge needed
[13:20.360 --> 13:25.080]  to eventually fix issues as soon as they arise.
[13:25.080 --> 13:30.560]  So around automation, use an immutable approach to your infrastructure.
[13:30.560 --> 13:35.680]  If you start to have different things and weird infrastructure going on, it will be
[13:35.680 --> 13:37.160]  a dead sentence.
[13:37.160 --> 13:41.360]  Second, version your infrastructure, GitOps is an option.
[13:41.360 --> 13:43.240]  There are many others.
[13:43.240 --> 13:48.720]  No matter what you do, try to have versions so that effectively you can potentially roll
[13:48.720 --> 13:54.520]  back or at least see what change from a version that is known to be working to the current
[13:54.520 --> 13:57.720]  one and automate the whole process.
[13:57.720 --> 14:01.080]  If you have humans involved, you will have issues.
[14:01.080 --> 14:08.400]  It will cost more and it will be effectively less resilient and reliable.
[14:08.400 --> 14:14.480]  So putting all together what we have seen, I would suggest to first create a multi-data
[14:14.480 --> 14:20.360]  center architecture so that effectively you have all that redundancy and kind of things
[14:20.360 --> 14:23.240]  but hide them from your developers.
[14:23.240 --> 14:32.840]  Maybe they know the region concept or they said concept but don't show their physical
[14:32.840 --> 14:37.800]  layout to your users otherwise they will start to do weird stuff.
[14:37.800 --> 14:41.440]  Second, use a tool to manage the clusters.
[14:41.440 --> 14:45.640]  Open cluster management is an open source project that does it.
[14:45.640 --> 14:49.400]  There are other projects that do similar things.
[14:49.400 --> 14:53.720]  It's very, very useful and it will help you over time because probably you will end up
[14:53.720 --> 14:55.520]  running many clusters.
[14:55.520 --> 15:02.920]  Third, I would suggest personally to standardize on the Kubernetes APIs as the only interface
[15:02.920 --> 15:11.160]  between the workload and the infrastructure because those are, as seen, very known.
[15:11.160 --> 15:17.560]  Use a bare metal container platform so don't use virtualization or other stuff into it.
[15:17.560 --> 15:23.440]  You will have, hopefully, enough workload to justify tons of servers, physical servers,
[15:23.440 --> 15:27.960]  don't add complexity with virtualization in between.
[15:27.960 --> 15:32.360]  Automate all the infrastructure pieces and configurations, obviously, as seen.
[15:32.360 --> 15:41.720]  Start providing only a few interfaces to your business and then eventually extend them when
[15:41.720 --> 15:42.720]  needed.
[15:42.720 --> 15:49.280]  So an example would be an OCI registry, object storage and pods, deployments, those kind
[15:49.280 --> 15:50.720]  of basic things.
[15:50.720 --> 15:55.760]  And then if your business comes out saying, oh, we really need that, then eventually you
[15:55.760 --> 15:57.200]  expand.
[15:57.200 --> 16:05.360]  But the thing is, only provide new services when you are sure that there is the requirement
[16:05.360 --> 16:06.360]  for it.
[16:06.360 --> 16:09.600]  So, for instance, let's say that you want to do a database as a service.
[16:09.600 --> 16:14.760]  You already have onboarded 100 applications, 80 of those actually use MySQL.
[16:14.760 --> 16:20.440]  It would make sense to provide MySQL as a service, but it does not make sense to provide
[16:20.440 --> 16:25.760]  50 different databases as a service of which 48 will never be used.
[16:25.760 --> 16:28.760]  It's only complexity and cost for you.
[16:28.760 --> 16:35.320]  And then create a simple UX for your users that completely obstruct everything that is
[16:35.320 --> 16:36.320]  below.
[16:36.320 --> 16:42.960]  So even push your Kubernetes configuration here and we will manage it.
[16:42.960 --> 16:51.200]  And hopefully then you will be ensuring that all this stuff is fashion and so on so that
[16:51.200 --> 16:58.720]  even when the workload fades for some reason, you can say, look, a version N minus one was
[16:58.720 --> 16:59.720]  working.
[16:59.720 --> 17:02.040]  You did something, now it's broken.
[17:02.040 --> 17:03.800]  It's not the info.
[17:03.800 --> 17:06.720]  So this was it.
[17:06.720 --> 17:07.720]  Thank you.
[17:07.720 --> 17:20.600]  I don't know if we have a couple of minutes for questions, no?
[17:20.600 --> 17:32.800]  If there are.
[17:32.800 --> 17:34.920]  Thank you for your talk.
[17:34.920 --> 17:39.640]  Could you expand a bit on the, I didn't get why the, what was the advantages of building
[17:39.640 --> 17:43.680]  multiple data centers at first?
[17:43.680 --> 17:50.000]  So that is usually a business requirement because they will say, oh, we want to have everything
[17:50.000 --> 17:55.600]  that is HA or at least this service needs to be HA and with one data center, it would
[17:55.600 --> 17:57.600]  be hard.
[17:57.600 --> 18:03.480]  Obviously it really depends if you are a small organization, maybe two data center, three
[18:03.480 --> 18:05.400]  data center could be okay.
[18:05.400 --> 18:12.320]  If you are a big organization, maybe spread throughout five, 10 legally different regions,
[18:12.320 --> 18:15.240]  then you will need obviously 30, 50 data center.
[18:15.240 --> 18:16.680]  That's a completely different scale.
[18:16.680 --> 18:22.120]  Obviously all those are very generic suggestions and then you have to apply them to your specific
[18:22.120 --> 18:24.480]  situation.
[18:24.480 --> 18:30.720]  And just a quick follow-up on that, how do you hide that from the workload developer?
[18:30.720 --> 18:36.040]  So the line just after that one where you say they have to not know about the multiple
[18:36.040 --> 18:37.640]  clusters, how does that work?
[18:37.640 --> 18:43.400]  Yeah, so if you pick AWS for instance, they have the concept of region and AZ.
[18:43.400 --> 18:47.200]  Some AZ, so AZs are not data centers.
[18:47.200 --> 18:48.720]  Some are data centers.
[18:48.720 --> 18:53.720]  Other are parts of a data center, but different availability zones within the data center
[18:53.720 --> 18:59.880]  are others are containers, in the sense of like 40 food containers full of servers.
[18:59.880 --> 19:01.960]  So the user does not know.
[19:01.960 --> 19:06.920]  They know that there is region X, AZ 1, 2, 3.
[19:06.920 --> 19:09.760]  What 1, 2, 3 means, no one knows.
[19:09.760 --> 19:10.760]  And no one cares.
[19:10.760 --> 19:11.760]  And that's the thing.
[19:11.760 --> 19:26.360]  Thank you for the talk.
[19:26.360 --> 19:33.000]  And in your definition of locking, you spoke about cost of portability, multiply by probability
[19:33.000 --> 19:34.320]  of portability.
[19:34.320 --> 19:42.280]  But doesn't it like if you file to assess the probability of portability, wouldn't you
[19:42.280 --> 19:47.760]  fall in a lock-in without being aware of it?
[19:47.760 --> 19:50.320]  Sorry, what do you mean?
[19:50.320 --> 19:55.480]  Okay, I will always run my cloud in Amazon Web Service.
[19:55.480 --> 19:59.280]  Why would I need portability?
[19:59.280 --> 20:04.440]  And then I start using locked-in products.
[20:04.440 --> 20:07.680]  So I will never be able to leave.
[20:07.680 --> 20:10.280]  Yes, well, you will be able to leave.
[20:10.280 --> 20:11.680]  It's always possible to leave.
[20:11.680 --> 20:15.760]  You will simply revive from scratch your whole application and you leave.
[20:15.760 --> 20:17.000]  So what is the cost of that?
[20:17.000 --> 20:18.000]  A billion?
[20:18.000 --> 20:19.000]  Okay.
[20:19.000 --> 20:22.520]  So now it becomes a billion of lock-in.
[20:22.520 --> 20:23.880]  That is my point.
[20:23.880 --> 20:28.000]  You can rewrite tomorrow from scratch, from the way it's up.
[20:28.000 --> 20:29.000]  It's possible.
[20:29.000 --> 20:30.720]  How much it will cost you?
[20:30.720 --> 20:31.720]  A billion?
[20:31.720 --> 20:32.720]  Five billion?
[20:32.720 --> 20:33.720]  A trillion?
[20:33.720 --> 20:34.720]  Okay.
[20:34.720 --> 20:35.880]  That is your lock-in value.
[20:35.880 --> 20:36.880]  And that's the thing.
[20:36.880 --> 20:42.320]  Obviously, I would suggest you to keep the lock-in as low as possible.
[20:42.320 --> 20:44.520]  So try not to re-write.
[20:44.520 --> 20:47.680]  To be in a situation where you have to rewrite everything.
[20:47.680 --> 20:48.680]  Thank you.
[20:48.680 --> 20:59.120]  Hello, one quick question.
[20:59.120 --> 21:04.640]  So if your organization has a traditional manual approach to operations, which thing
[21:04.640 --> 21:07.360]  would you automate first?
[21:07.360 --> 21:15.040]  I would start from very simple processes just to ensure that it works in the organization.
[21:15.040 --> 21:22.400]  The organizations start to understand it, processes like create VMs or create containers,
[21:22.400 --> 21:29.160]  whatever kind of thing you do, and then some things such as patching and so on.
[21:29.160 --> 21:35.240]  But if you really want to go the automation way, it's way easier to, after you have tested
[21:35.240 --> 21:39.640]  a little bit the thing, start to say, okay, now we have the version two of the environment
[21:39.640 --> 21:42.000]  that is fully automated from day zero.
[21:42.000 --> 21:49.200]  Otherwise, you will always be in a kind of automated but not completely automated situation.
[21:49.200 --> 21:50.200]  Thank you.
[21:50.200 --> 21:51.200]  Thank you.
[21:51.200 --> 21:52.200]  Thank you.
[21:52.200 --> 21:53.200]  Thank you.
[21:53.200 --> 21:54.200]  Thank you.
[21:54.200 --> 21:55.200]  Thank you.
[21:55.200 --> 21:56.200]  Thank you.
[21:56.200 --> 21:57.200]  Thank you.
[21:57.200 --> 22:13.240]  Thank you.