[00:00.000 --> 00:09.200] So good evening, everyone, and welcome to the last talk in this session. [00:09.200 --> 00:12.360] I hope you still have some energy left. [00:12.360 --> 00:13.720] So my name is Sascha Roloff. [00:13.720 --> 00:19.840] I'm working at the Huawei Intelligent Cloud Technology Lab of the Huawei Munich Research [00:19.840 --> 00:26.800] Center, and today we are going to take a look under the hood of build systems and what common [00:26.800 --> 00:33.560] practices are currently used in basically all of the build systems and why many of [00:33.560 --> 00:39.440] them are suboptimal in certain regards and how they can be improved by a concept called [00:39.440 --> 00:42.440] staging. [00:42.440 --> 00:47.760] So in order to explain you the issues with current build systems, I directly jump into [00:47.760 --> 00:56.040] an example and I guess many of you have used make once or twice in your open source developments. [00:56.040 --> 00:58.720] So let's start with this classic build system. [00:58.720 --> 01:04.680] So we want to create a build description for a very simple hello world application composed [01:04.680 --> 01:12.760] of a hello binary and a greet library, and the greeting phrase is hard coded inside the [01:12.760 --> 01:20.480] hello binary and a greet e can be injected at compile time at the greet library. [01:20.480 --> 01:22.920] So this is a make file. [01:22.920 --> 01:31.280] We have our rules which describe which artifacts are generated by actions based on a set of [01:31.280 --> 01:38.240] input artifacts, and so we have different actions actually involved to generate the [01:38.240 --> 01:44.040] final binary, and for example we have compile actions to generate the object files like [01:44.040 --> 01:50.680] the hello.o or the greet.o, we have archive action for the greet library, and the final [01:50.680 --> 01:54.640] linking action to actually generate the binary. [01:54.640 --> 02:00.520] At the end we also want to create some sample output, so we just take the output of the [02:00.520 --> 02:03.680] hello world and store it in a text file. [02:03.680 --> 02:10.720] So nothing spectacular right now, each artifact is associated with a file on the file system [02:10.720 --> 02:14.400] so the actions can directly operate on it. [02:14.400 --> 02:19.960] If we execute the build, we just see all the actions are executed, everything fine, [02:19.960 --> 02:27.560] and the output is generated, and yes now the boss comes into our office and he's unhappy [02:27.560 --> 02:34.680] with our result, he wants to put it basically on a poster and it should be more readable. [02:34.680 --> 02:41.160] Okay so yeah then let's add some post processing to the task, and we just take the output of [02:41.160 --> 02:47.400] the hello binary, store it in the intermediate file and put this intermediate file into the [02:47.400 --> 02:54.160] post processing and translate all letters into, capitalize all letters basically and [02:54.160 --> 02:57.400] store it into post processed text. [02:57.400 --> 03:04.880] And then finally put this text into the target sample output, and we execute this, we see [03:04.880 --> 03:11.560] new actions are executed and the result is fine, looks much better now, hello world and [03:11.560 --> 03:17.600] capital letters great, but the boss is still unhappy, he wants to have some localization, [03:17.600 --> 03:23.600] he doesn't want to greet the whole world, he just wants to greet Munich and Brussels, [03:23.600 --> 03:29.240] and he wants to have it both in a single make file, so what do we have to do now, okay we [03:29.240 --> 03:36.360] have to basically we have two program variants now, and what should we do in order to reuse [03:36.360 --> 03:43.560] most of our rules that we already have, we can use a for loop over the location dependent [03:43.560 --> 03:49.920] targets and interpolate the city name into the artifact names as you can see here, so [03:49.920 --> 03:56.440] we have now not only a single hello binary, but two hello binaries with dot and the name [03:56.440 --> 04:07.040] of the city, and these are our two program variants, and as you can see there is a lot [04:07.040 --> 04:14.280] of string interpolation coming into our make file and it doesn't make it really readable, [04:14.280 --> 04:19.880] but we have to do it, because each artifact is associated with a file on the file system [04:19.880 --> 04:26.560] and this needs to be a unique name, so we have to do it basically, and when we execute [04:26.560 --> 04:35.680] it, okay now we get a bit more actions, but it's working, and we see now, okay the output [04:35.680 --> 04:43.560] is as required and we greet Munich and Brussels, but the boss is still, I mean he's happy now [04:43.560 --> 04:51.280] with our output, but now he's unhappy with our implementation, he says that's not maintainable, [04:51.280 --> 04:55.000] why do we use a build system from the 70s, use a modern one, they are supposed to do [04:55.000 --> 05:03.960] much better now, well okay, then let's use Bazel, and this is what it looks like in Bazel, [05:03.960 --> 05:11.000] so the same application, and as it turns out they are better, but not in all regards, so [05:11.000 --> 05:17.360] they introduce high level concepts like the CC binary and CC library, we don't have to [05:17.360 --> 05:24.680] manually write object file creation and linking, but it's, everything is wrapped now inside [05:24.680 --> 05:35.600] these high level concept calls, and also our bash calls are wrapped in these general targets, [05:35.600 --> 05:41.920] but I mean it looks a bit more readable now, but still we have this string interpolation [05:41.920 --> 05:49.000] here, and the for loops over the city names, and yeah why is it actually like that, why [05:49.000 --> 05:54.760] do we need this, I mean it's a modern build system, and the reason is because Bazel also [05:54.760 --> 06:03.840] associates each artifact with a file on the file system, and yeah, so that's why this [06:03.840 --> 06:10.800] basically brings us to an important observation, and this means even modern build systems, [06:10.800 --> 06:20.360] it's required that you have unique names for your artifacts, and because they basically [06:20.360 --> 06:25.920] follow a design decision implemented by make in the 70s, and the design decision by make [06:25.920 --> 06:33.000] was that each artifact needs to have a fixed location in the file system, well for make [06:33.000 --> 06:39.640] it was perfectly fine at that time, because there was nothing else or not much different [06:39.640 --> 06:46.400] to do in order to determine which part of a program needs to be recomputed, basically [06:46.400 --> 06:52.880] to compare timestamps, and for this you need files, so for make this was totally fine, [06:52.880 --> 06:57.520] but there is actually no reason anymore to do this in modern build systems, because they [06:57.520 --> 07:07.000] anyway isolate their actions in order to avoid getting unwanted dependencies into their builds, [07:07.000 --> 07:13.000] so their actions are executed either in a separate directory or in a container in order [07:13.000 --> 07:19.080] to better control the dependencies, so when they anyway execute their actions, why don't [07:19.080 --> 07:27.720] we allow the targets to specify where to put the artifacts, and this is exactly the idea [07:27.720 --> 07:33.440] of staging, so basically there is no technical reason for modern build systems for restriction [07:33.440 --> 07:42.120] of to associate each artifact with a file, and instead we propose that we should stop [07:42.120 --> 07:49.840] following this common practice and apply staging instead, and the idea of staging is that an [07:49.840 --> 07:57.960] action can freely select the location of input and output artifacts within its working directory, [07:57.960 --> 08:04.600] and this basically introduces a separation between physical and logical paths, inside [08:04.600 --> 08:12.360] an action, you only work on the logical paths, and the action can freely decide where to [08:12.360 --> 08:21.760] put a generated artifact, or where it wants to read an input artifact, and so this is [08:21.760 --> 08:28.560] basically our proposal to apply staging, and how could it be look like if it's implemented [08:28.560 --> 08:33.200] in a build system, so this is basically our project, it's called just build, and this [08:33.200 --> 08:42.840] is a build description that we propose, so we also have the definitions of our targets [08:42.840 --> 08:49.360] here, we also use the high level concepts like binaries and libraries, and in this JSON [08:49.360 --> 08:55.040] syntax the type just selects which kind of artifact or which kind of target it basically [08:55.040 --> 09:02.800] is, and what we can see inside the target definitions our artifacts are named without [09:02.800 --> 09:09.520] string interpolation, so we don't need to artificially invent unique names for our artifacts, [09:09.520 --> 09:17.000] they are just like they are, and also for example here this use target we just access [09:17.000 --> 09:24.160] the hello binary, even though we will have two of these binaries but we just write hello, [09:24.160 --> 09:33.160] and we don't care, I mean it's staged, and what is created from the depending dependency, [09:33.160 --> 09:40.240] it's just staging the final result at that location where we need it, so, but still we [09:40.240 --> 09:46.200] have the for loop, this is something what we of course still need to, which basically [09:46.200 --> 09:52.880] creates two configurations, which is then propagated, I mean this variable that is created [09:52.880 --> 09:59.200] here is propagated to all the depending targets, and it propagates until the greed library, [09:59.200 --> 10:07.360] which then reads this configuration variable and injects it into the compile command, so [10:07.360 --> 10:14.240] this is how a description could look like with staging, and from this description we [10:14.240 --> 10:20.440] can also generate a so called target graph, which shows the dependencies of the targets, [10:20.440 --> 10:27.120] so main depends on all, all depends on two post process because we have two configured [10:27.120 --> 10:34.160] targets, so the greed library basically is duplicated and this propagates until the post [10:34.160 --> 10:42.480] process target, and these target graph or targets are basically high level concepts, [10:42.480 --> 10:48.120] if you want to take a look into which actions are actually executed, you can also generate [10:48.120 --> 10:55.280] an action graph, which shows a data flow, that's why the errors are inverted, and it's [10:55.280 --> 11:02.360] a bipartite graph, which means so, the ellipses are the artifacts and the rectangles are [11:02.360 --> 11:10.400] the actions, and yeah, so you can really see the artifact names are basically the same, [11:10.400 --> 11:16.600] so post process dot txt and post process dot txt are the same names in both branches, [11:16.600 --> 11:22.600] and since they are staged in logical paths, there's no problem, there's no conflict actually, [11:22.600 --> 11:29.960] this would not work in make, you would have to use unique names, okay, so and what happens [11:29.960 --> 11:35.600] when we execute it, so we just select the target that we want to build, and there is [11:35.600 --> 11:40.600] some output coming here, and it says okay we have 12 actions, zero cache hits, of course [11:40.600 --> 11:47.640] we execute, built at first time, so you can count it's 12 actions, and it's just built, [11:47.640 --> 11:53.640] the artifact is somewhere, I mean it could be in a remote execution, and then it's just [11:53.640 --> 11:59.240] existing in a remote cus, if you want to have the artifact in your local folder, then you [11:59.240 --> 12:06.720] have to install it, and when we execute the installation, we now see okay, again 12 actions, [12:06.720 --> 12:12.720] and also 12 cache hits, because everything is known already, and then the file is in [12:12.720 --> 12:20.520] your local directory actually, and we see the content is fine, and we even don't need [12:20.520 --> 12:24.760] to store it into our local directory, we can just print the content of an artifact by the [12:24.760 --> 12:34.800] minus p option, if we take staging seriously, we have also very nice implications, and one [12:34.800 --> 12:41.000] is for example, assume that you have an external source code that you want to use in your project, [12:41.000 --> 12:45.920] and you want to apply some patches on it, and yeah how do you do it, normally you would [12:45.920 --> 12:51.040] copy it, apply the patch, because you don't want to modify the original source code, and [12:51.040 --> 12:57.160] yeah this results in a lot of maintenance problems, but with staging this can be done [12:57.160 --> 13:04.640] much easier with logical in place patching, you just apply the patch on the logical path, [13:04.640 --> 13:12.080] and yeah, then let's take a look how this could look like, so we have now put our example [13:12.080 --> 13:21.200] files in a third party directory outside of our project, and a directory with the patches, [13:21.200 --> 13:27.600] and the patch just modifies the hello greeting phrase with a bonjour greeting phrase, and [13:27.600 --> 13:34.680] we just have to add a single block, a single block into our build description, which points [13:34.680 --> 13:41.240] to our patch and to the file that needs to be patched, and that's it, and the resulting [13:41.240 --> 13:46.800] target graph just shows, okay we have now one more target here, the hellocpp source [13:46.800 --> 13:54.440] target, and the other, the binaries are depending on this extra target now, and also in the [13:54.440 --> 14:00.400] action graph you can see that there is just a single new action actually added here to [14:00.400 --> 14:07.600] the action graph, where earlier was hellocpp is now the patched version of hellocpp, and [14:07.600 --> 14:13.080] it's just another input, and if something is changed in the patch all dependent targets [14:13.080 --> 14:23.200] are executed, okay if we execute it we see bonjour Munich, bonjour Brussels, works well, [14:23.200 --> 14:28.520] okay so quickly to summarize my talk, as we have seen there are some inconvenient habits [14:28.520 --> 14:33.880] in modern build systems, and yeah we propose to apply staging instead to make build systems [14:33.880 --> 14:40.340] better, and you will have a couple of advantages if you apply staging, and yeah which are written [14:40.340 --> 14:47.000] here, and this is not only a concept it's already implemented, so if you want to take [14:47.000 --> 14:54.800] a look into our project please come by, and yeah now the stage is yours, thanks for your [14:54.800 --> 15:15.200] attention, are there are any questions, there is a question, no I think it will repeat the [15:15.200 --> 15:34.080] question, you heard the question, yeah exactly we do actually content addressable storage, [15:34.080 --> 15:40.320] so repeat the question please, okay the question was how do we identify which source code we [15:40.320 --> 15:47.960] actually need for the staging, and yeah we apply content addressable storage, so we [15:47.960 --> 15:52.760] determine a hash basically from all of our source codes, and then also we know what has [15:52.760 --> 16:10.200] been changed or not, any other questions, yeah so the question was whether we [16:10.200 --> 16:19.080] use the jason syntax, yeah no we decided for jason, and yeah it's jason is used as our [16:19.080 --> 16:34.400] build description syntax, okay so how many developers are working on it, and is it widely [16:34.400 --> 16:42.200] used, a very good question actually we recently got open sourced, and we are in total five [16:42.200 --> 16:50.040] developers currently working on it, and yeah but we really try to implement the new concepts [16:50.040 --> 16:55.520] into this build system, and make it a really sound build system compared to other modern [16:55.520 --> 17:00.920] build systems, and yeah so please just take a look at our project, and there is a nice [17:00.920 --> 17:09.280] tutorial also, which well everything explains nicely, and hope to see you there, okay thank [17:09.280 --> 17:32.640] you for the talk, thank you, goodbye, thank you for coming.