[00:00.000 --> 00:08.800] So, Fabio, the stage is yours. [00:08.800 --> 00:11.760] I'm really looking forward to hearing about on-premise data centers. [00:11.760 --> 00:14.160] Do not need to be a legacy. [00:14.160 --> 00:17.120] Thank you. [00:17.120 --> 00:19.680] So hello, everyone. [00:19.680 --> 00:25.560] And just to be clear, this is going to be the topic that we are going to cover a little [00:25.560 --> 00:32.400] bit of history, some less alert, and then some technology bets that I think that would make [00:32.400 --> 00:35.040] sense in such conversation. [00:35.040 --> 00:40.360] So about me, I have been a Linux user for 20-ish years. [00:40.360 --> 00:46.120] I've been working with Linux for close to 20 years now, and I currently work with Reddit [00:46.120 --> 00:51.320] and do basically similar kind of conversation in my day-to-day job. [00:51.320 --> 00:55.520] So let's start with a little bit of history. [00:55.520 --> 00:56.520] Of the cloud. [00:56.520 --> 00:57.520] Let's call it this way. [00:57.520 --> 01:04.360] So Rackspace was founded in 1998 and I think was the first company that defined itself [01:04.360 --> 01:07.200] as cloud. [01:07.200 --> 01:10.160] In 2005, SoftLayer was founded. [01:10.160 --> 01:13.840] They defined themselves as bare-meta-cloud. [01:13.840 --> 01:23.080] And then in 2006, we have the S3 launched by AWS, which was the first service of AWS. [01:23.080 --> 01:28.960] 2006 again, EC2, sorry, yeah. [01:28.960 --> 01:37.480] And then Google App Engine arrived, IBM bought SoftLayer, creating an IBM cloud now called [01:37.480 --> 01:43.360] and by 2021, AWS has more than 200 different services. [01:43.360 --> 01:53.720] So what about the history of the known cloud, because what we have seen are all cloud environments, [01:53.720 --> 01:56.920] but those are nothing new if you think about this. [01:56.920 --> 02:05.520] So in 1964, which is probably older than anyone or most of the people in this room, [02:05.520 --> 02:17.400] IBM introduced the CP40 and this machine had time-sharing technology, which was very different [02:17.400 --> 02:26.320] from what we call today cloud, but still it was probably the initial point of the history [02:26.320 --> 02:35.760] of the cloud and in the late 60s, IBM released Simon, which is an hypervisor. [02:35.760 --> 02:43.400] By 74, the two definitions, the two kinds of hypervisor get defined as tape 1, the bare-meta [02:43.400 --> 02:47.440] virtualization, tape 2, the hosted virtualization. [02:47.440 --> 02:55.880] And by 1998, VMware got founded and in 2000s, majority of companies moved from bare-meta [02:55.880 --> 02:58.080] to VMware's VMs. [02:58.080 --> 03:06.560] 2001, ESX got released, which was type 1 kind of virtualization. [03:06.560 --> 03:13.280] 2003, we have the first type 1 open source virtualization, Zen. [03:13.280 --> 03:21.080] And still 2003, VMware introduces Vmotion, which allows you to basically move a machine [03:21.080 --> 03:25.840] from one host to the other without rebooting it. [03:25.840 --> 03:32.080] In 2008, Microsoft arrived with Hyper-V, it previously had some other kind of virtualization [03:32.080 --> 03:35.440] tool, but Hyper-V got launched in 2008. [03:35.440 --> 03:37.360] So what is the cloud? [03:37.360 --> 03:42.720] Why we are distinguished the first group and the second one? [03:42.720 --> 03:48.120] Wikipedia says that cloud computing is the on-demand availability of computing system [03:48.120 --> 03:53.200] resources, especially data storage and computing power without direct active management by [03:53.200 --> 03:54.200] the user. [03:54.200 --> 03:58.400] So I think this is a good definition. [03:58.400 --> 04:05.200] I think that a better definition is a business model where one party rents to a second party [04:05.200 --> 04:11.560] computer system resources, especially data storage, cloud storage and computing power [04:11.560 --> 04:15.520] with the smallest granularity possible. [04:15.520 --> 04:21.000] And my point is cloud is not technical, is a business model. [04:21.000 --> 04:29.200] And if you think about, we move from renting machines like VPS on a monthly basis, and [04:29.200 --> 04:36.880] then AWS introduced the concept of ECU that was initially on an hourly based and then [04:36.880 --> 04:38.480] minute and then second. [04:38.480 --> 04:44.800] And now you can buy lambdas or similar kind of things for milliseconds. [04:44.800 --> 04:48.560] And in a way, also CPUs had the same shrinkage. [04:48.560 --> 04:58.840] So we move from food CPUs or sockets to vCPUs, which basically is hyper-threaded threads [04:58.840 --> 05:03.680] to fractional vCPUs with lambdas or similar services. [05:03.680 --> 05:10.240] So my point being, the whole thing about cloud is not technical, is only about the business [05:10.240 --> 05:12.120] side of it. [05:12.120 --> 05:18.360] So what can we learn from not only the last 20 years of what we can define as cloud, but [05:18.360 --> 05:26.560] also the previous 50 of what we can define as non-cloud, and more specifically, because [05:26.560 --> 05:30.280] we have seen that the cloud model actually works. [05:30.280 --> 05:37.880] The non-cloud model was not very functional to the business, to the point that very often [05:37.880 --> 05:46.320] those data centers got outsourced or in some different ways moved to the cloud in the sense [05:46.320 --> 05:55.840] that moved to someone else, and the business started to expand constantly those machines [05:55.840 --> 06:02.000] due to the basically OPEX model instead of the COPEX model. [06:02.000 --> 06:07.800] So there is one big aspect that we need to remember about this, which is the separation [06:07.800 --> 06:08.800] of concerns. [06:08.800 --> 06:13.200] So standardize the interface between the infrastructure and the workload. [06:13.200 --> 06:20.240] If you go in legacy data centers, very often you have 1,000 different kinds of systems [06:20.240 --> 06:24.920] that the infra people have to provide to the workload people. [06:24.920 --> 06:30.360] And this is because, oh, my system is different, my software is different, whatever, in the [06:30.360 --> 06:37.680] end of the day, that is a huge load for the infrastructure part of the business. [06:37.680 --> 06:44.840] Second, the scalability needs to be at workload level, so the infrastructure also needs to [06:44.840 --> 06:53.400] be somehow reliable and within some SLAs, but if the system has to stay up, if the application [06:53.400 --> 06:57.240] has to stay up, the application will have to take care about this. [06:57.240 --> 07:04.880] And third, workload have an abstract concept of whatever is underneath it, so the physical [07:04.880 --> 07:05.880] architecture. [07:05.880 --> 07:12.000] They don't need to know which data center they are in or in which rack, what is the [07:12.000 --> 07:15.640] nearby server, and so on. [07:15.640 --> 07:26.440] The function, so we also need a functional business model for a good managed IT system. [07:26.440 --> 07:34.880] And the first part is, as before, standardize the interface between the workload and infrastructure [07:34.880 --> 07:39.480] so that it's easily countable and priceable. [07:39.480 --> 07:45.320] Second, build back the infrastructure cost to the workload owners. [07:45.320 --> 07:50.600] We have seen, at least in my definition of cloud, that we still have two parties, one [07:50.600 --> 07:55.920] that delivers a service and the other one that consumes it and pays for it. [07:55.920 --> 08:03.240] So it's very important to create this also internally in companies or organizations of [08:03.240 --> 08:10.880] any kind, because this allows the infrastructure side of the business to justify their expenses [08:10.880 --> 08:21.880] over some kind of at least recognition of revenue or whatever, cost recovery, whatever. [08:21.880 --> 08:24.360] And third, keep the cost down. [08:24.360 --> 08:31.280] This is a key point, AWS, Google, those companies will do everything they can to keep the cost [08:31.280 --> 08:36.240] down because they need to be positive, cash flow positive. [08:36.240 --> 08:42.480] Obviously, if you are a department in a company, it's slightly different, but it's very important [08:42.480 --> 08:48.120] to still be cash flow positive because this will guarantee you that you will not have [08:48.120 --> 08:55.120] any issues over time with this part of the financial model. [08:55.120 --> 08:57.520] And third, maintain control. [08:57.520 --> 09:03.840] We have seen the clouds are obsessed about maintaining control and obtaining even more [09:03.840 --> 09:10.160] control on their hardware, their system, whatever, and this is very important for your own cloud [09:10.160 --> 09:14.800] if you want to be able to maintain it for 10, 20, 50 years. [09:14.800 --> 09:22.480] So the first one is, I would say, do not use, but be very cautious on using third-party [09:22.480 --> 09:28.200] property software, those companies can go away, can change pricing model, can do whatever, [09:28.200 --> 09:29.840] be aware of this. [09:29.840 --> 09:37.960] Second, evaluate very strongly the buy versus build decision because when you buy, obviously [09:37.960 --> 09:43.120] it's here now, but you don't have the know-how about this. [09:43.120 --> 09:50.600] So probably you will want to build a lot of your systems, not the core parts, but maybe [09:50.600 --> 09:56.040] the dashboard layer or that kind of thing so that you can effectively manage it however [09:56.040 --> 09:58.560] you think better. [09:58.560 --> 10:03.480] And third, be very aware of lockings because those will bite you over the course of the [10:03.480 --> 10:05.120] years. [10:05.120 --> 10:12.640] So how do I define the locking? [10:12.640 --> 10:18.320] I define it as the product between the probability that a component will require substitution [10:18.320 --> 10:23.320] during the solution lifetime and the total cost of the substitution. [10:23.320 --> 10:30.400] So for instance, Linux, if you base all your architecture on Linux, it's going to be very [10:30.400 --> 10:36.480] expensive to move out of Linux, but in the other hand, it's very improbable that you [10:36.480 --> 10:44.720] will need to do it because very probably in 10, 20 years Linux will be here. [10:44.720 --> 10:52.280] So a couple of points on technologies, the first one is keep the complexity of your system [10:52.280 --> 10:54.920] at the lowest level possible. [10:54.920 --> 11:00.480] Systems will get more complex and more absurd over time, so at least at the beginning start [11:00.480 --> 11:03.280] with the simple thing possible. [11:03.280 --> 11:07.680] Second, prefer build time complexity over run time complexity. [11:07.680 --> 11:13.840] It's way easier to automate a build thing than to automate something to be run. [11:13.840 --> 11:20.560] And also when something breaks, it's better if it's simple because it's easier to fix. [11:20.560 --> 11:26.760] If you have to compile your stuff, compile it, but try to keep the complexity at the [11:26.760 --> 11:29.520] run time at the minimum possible. [11:29.520 --> 11:38.680] Third, minimize the amount of services that you deliver to your business or your workload's [11:38.680 --> 11:43.880] owners so that effectively you can guarantee that those services are exactly what they [11:43.880 --> 11:48.800] require and you are able to deliver them in a sensible way. [11:48.800 --> 11:53.840] So I think that one big point is containers. [11:53.840 --> 11:59.440] Delivering a container-based solution at least, it's probably the best option I think [11:59.440 --> 12:08.360] today and user Kubernetes distribution, whatever you prefer and choose, that makes sense, it's [12:08.360 --> 12:16.760] fine and we'll see later the Kubernetes APIs are now fairly well-known, fairly abstract [12:16.760 --> 12:23.040] and fairly used so that those can be a good interface between the infra and the workload [12:23.040 --> 12:24.040] side. [12:24.040 --> 12:29.640] Also, you can do it yourself, community, call it whatever, a distribution. [12:29.640 --> 12:34.000] You can buy a commercial distribution of Kubernetes. [12:34.000 --> 12:40.280] If you do it, first, be sure that it's fully open source what you're buying so that you [12:40.280 --> 12:46.360] decrease your lock-in because you are decreasing the cost that it will take you to move from [12:46.360 --> 12:48.560] this to any other solution. [12:48.560 --> 12:54.200] Second, from a trust-worthy company, hopefully that company that you buy it from will not [12:54.200 --> 12:59.080] fail tomorrow because if it does, you will have bigger problem. [12:59.080 --> 13:07.360] Enter with a long track record of not screwing their customers because it's not good. [13:07.360 --> 13:12.440] And if they are heavily involved into the open source community, it's even better because [13:12.440 --> 13:20.360] that means that they are driving the development and they do have all the knowledge needed [13:20.360 --> 13:25.080] to eventually fix issues as soon as they arise. [13:25.080 --> 13:30.560] So around automation, use an immutable approach to your infrastructure. [13:30.560 --> 13:35.680] If you start to have different things and weird infrastructure going on, it will be [13:35.680 --> 13:37.160] a dead sentence. [13:37.160 --> 13:41.360] Second, version your infrastructure, GitOps is an option. [13:41.360 --> 13:43.240] There are many others. [13:43.240 --> 13:48.720] No matter what you do, try to have versions so that effectively you can potentially roll [13:48.720 --> 13:54.520] back or at least see what change from a version that is known to be working to the current [13:54.520 --> 13:57.720] one and automate the whole process. [13:57.720 --> 14:01.080] If you have humans involved, you will have issues. [14:01.080 --> 14:08.400] It will cost more and it will be effectively less resilient and reliable. [14:08.400 --> 14:14.480] So putting all together what we have seen, I would suggest to first create a multi-data [14:14.480 --> 14:20.360] center architecture so that effectively you have all that redundancy and kind of things [14:20.360 --> 14:23.240] but hide them from your developers. [14:23.240 --> 14:32.840] Maybe they know the region concept or they said concept but don't show their physical [14:32.840 --> 14:37.800] layout to your users otherwise they will start to do weird stuff. [14:37.800 --> 14:41.440] Second, use a tool to manage the clusters. [14:41.440 --> 14:45.640] Open cluster management is an open source project that does it. [14:45.640 --> 14:49.400] There are other projects that do similar things. [14:49.400 --> 14:53.720] It's very, very useful and it will help you over time because probably you will end up [14:53.720 --> 14:55.520] running many clusters. [14:55.520 --> 15:02.920] Third, I would suggest personally to standardize on the Kubernetes APIs as the only interface [15:02.920 --> 15:11.160] between the workload and the infrastructure because those are, as seen, very known. [15:11.160 --> 15:17.560] Use a bare metal container platform so don't use virtualization or other stuff into it. [15:17.560 --> 15:23.440] You will have, hopefully, enough workload to justify tons of servers, physical servers, [15:23.440 --> 15:27.960] don't add complexity with virtualization in between. [15:27.960 --> 15:32.360] Automate all the infrastructure pieces and configurations, obviously, as seen. [15:32.360 --> 15:41.720] Start providing only a few interfaces to your business and then eventually extend them when [15:41.720 --> 15:42.720] needed. [15:42.720 --> 15:49.280] So an example would be an OCI registry, object storage and pods, deployments, those kind [15:49.280 --> 15:50.720] of basic things. [15:50.720 --> 15:55.760] And then if your business comes out saying, oh, we really need that, then eventually you [15:55.760 --> 15:57.200] expand. [15:57.200 --> 16:05.360] But the thing is, only provide new services when you are sure that there is the requirement [16:05.360 --> 16:06.360] for it. [16:06.360 --> 16:09.600] So, for instance, let's say that you want to do a database as a service. [16:09.600 --> 16:14.760] You already have onboarded 100 applications, 80 of those actually use MySQL. [16:14.760 --> 16:20.440] It would make sense to provide MySQL as a service, but it does not make sense to provide [16:20.440 --> 16:25.760] 50 different databases as a service of which 48 will never be used. [16:25.760 --> 16:28.760] It's only complexity and cost for you. [16:28.760 --> 16:35.320] And then create a simple UX for your users that completely obstruct everything that is [16:35.320 --> 16:36.320] below. [16:36.320 --> 16:42.960] So even push your Kubernetes configuration here and we will manage it. [16:42.960 --> 16:51.200] And hopefully then you will be ensuring that all this stuff is fashion and so on so that [16:51.200 --> 16:58.720] even when the workload fades for some reason, you can say, look, a version N minus one was [16:58.720 --> 16:59.720] working. [16:59.720 --> 17:02.040] You did something, now it's broken. [17:02.040 --> 17:03.800] It's not the info. [17:03.800 --> 17:06.720] So this was it. [17:06.720 --> 17:07.720] Thank you. [17:07.720 --> 17:20.600] I don't know if we have a couple of minutes for questions, no? [17:20.600 --> 17:32.800] If there are. [17:32.800 --> 17:34.920] Thank you for your talk. [17:34.920 --> 17:39.640] Could you expand a bit on the, I didn't get why the, what was the advantages of building [17:39.640 --> 17:43.680] multiple data centers at first? [17:43.680 --> 17:50.000] So that is usually a business requirement because they will say, oh, we want to have everything [17:50.000 --> 17:55.600] that is HA or at least this service needs to be HA and with one data center, it would [17:55.600 --> 17:57.600] be hard. [17:57.600 --> 18:03.480] Obviously it really depends if you are a small organization, maybe two data center, three [18:03.480 --> 18:05.400] data center could be okay. [18:05.400 --> 18:12.320] If you are a big organization, maybe spread throughout five, 10 legally different regions, [18:12.320 --> 18:15.240] then you will need obviously 30, 50 data center. [18:15.240 --> 18:16.680] That's a completely different scale. [18:16.680 --> 18:22.120] Obviously all those are very generic suggestions and then you have to apply them to your specific [18:22.120 --> 18:24.480] situation. [18:24.480 --> 18:30.720] And just a quick follow-up on that, how do you hide that from the workload developer? [18:30.720 --> 18:36.040] So the line just after that one where you say they have to not know about the multiple [18:36.040 --> 18:37.640] clusters, how does that work? [18:37.640 --> 18:43.400] Yeah, so if you pick AWS for instance, they have the concept of region and AZ. [18:43.400 --> 18:47.200] Some AZ, so AZs are not data centers. [18:47.200 --> 18:48.720] Some are data centers. [18:48.720 --> 18:53.720] Other are parts of a data center, but different availability zones within the data center [18:53.720 --> 18:59.880] are others are containers, in the sense of like 40 food containers full of servers. [18:59.880 --> 19:01.960] So the user does not know. [19:01.960 --> 19:06.920] They know that there is region X, AZ 1, 2, 3. [19:06.920 --> 19:09.760] What 1, 2, 3 means, no one knows. [19:09.760 --> 19:10.760] And no one cares. [19:10.760 --> 19:11.760] And that's the thing. [19:11.760 --> 19:26.360] Thank you for the talk. [19:26.360 --> 19:33.000] And in your definition of locking, you spoke about cost of portability, multiply by probability [19:33.000 --> 19:34.320] of portability. [19:34.320 --> 19:42.280] But doesn't it like if you file to assess the probability of portability, wouldn't you [19:42.280 --> 19:47.760] fall in a lock-in without being aware of it? [19:47.760 --> 19:50.320] Sorry, what do you mean? [19:50.320 --> 19:55.480] Okay, I will always run my cloud in Amazon Web Service. [19:55.480 --> 19:59.280] Why would I need portability? [19:59.280 --> 20:04.440] And then I start using locked-in products. [20:04.440 --> 20:07.680] So I will never be able to leave. [20:07.680 --> 20:10.280] Yes, well, you will be able to leave. [20:10.280 --> 20:11.680] It's always possible to leave. [20:11.680 --> 20:15.760] You will simply revive from scratch your whole application and you leave. [20:15.760 --> 20:17.000] So what is the cost of that? [20:17.000 --> 20:18.000] A billion? [20:18.000 --> 20:19.000] Okay. [20:19.000 --> 20:22.520] So now it becomes a billion of lock-in. [20:22.520 --> 20:23.880] That is my point. [20:23.880 --> 20:28.000] You can rewrite tomorrow from scratch, from the way it's up. [20:28.000 --> 20:29.000] It's possible. [20:29.000 --> 20:30.720] How much it will cost you? [20:30.720 --> 20:31.720] A billion? [20:31.720 --> 20:32.720] Five billion? [20:32.720 --> 20:33.720] A trillion? [20:33.720 --> 20:34.720] Okay. [20:34.720 --> 20:35.880] That is your lock-in value. [20:35.880 --> 20:36.880] And that's the thing. [20:36.880 --> 20:42.320] Obviously, I would suggest you to keep the lock-in as low as possible. [20:42.320 --> 20:44.520] So try not to re-write. [20:44.520 --> 20:47.680] To be in a situation where you have to rewrite everything. [20:47.680 --> 20:48.680] Thank you. [20:48.680 --> 20:59.120] Hello, one quick question. [20:59.120 --> 21:04.640] So if your organization has a traditional manual approach to operations, which thing [21:04.640 --> 21:07.360] would you automate first? [21:07.360 --> 21:15.040] I would start from very simple processes just to ensure that it works in the organization. [21:15.040 --> 21:22.400] The organizations start to understand it, processes like create VMs or create containers, [21:22.400 --> 21:29.160] whatever kind of thing you do, and then some things such as patching and so on. [21:29.160 --> 21:35.240] But if you really want to go the automation way, it's way easier to, after you have tested [21:35.240 --> 21:39.640] a little bit the thing, start to say, okay, now we have the version two of the environment [21:39.640 --> 21:42.000] that is fully automated from day zero. [21:42.000 --> 21:49.200] Otherwise, you will always be in a kind of automated but not completely automated situation. [21:49.200 --> 21:50.200] Thank you. [21:50.200 --> 21:51.200] Thank you. [21:51.200 --> 21:52.200] Thank you. [21:52.200 --> 21:53.200] Thank you. [21:53.200 --> 21:54.200] Thank you. [21:54.200 --> 21:55.200] Thank you. [21:55.200 --> 21:56.200] Thank you. [21:56.200 --> 21:57.200] Thank you. [21:57.200 --> 22:13.240] Thank you.