[00:00.000 --> 00:24.320] Hi, I'm Akihiro Suda from NTT Corporation at JEPRAN, in this session, I talk about [00:24.320 --> 00:33.360] a bit-for-bit reproducible build with Dockerfile, focusing on the determinies of the timestamps [00:33.360 --> 00:37.240] and the abstract partition versions. [00:37.240 --> 00:46.080] I have a demo, and you can reproduce my demo by yourselves using this github.com. [00:46.080 --> 00:57.520] Let's begin with what are reproducible builds. [00:57.520 --> 01:05.320] Each means producing exactly the same binary when you have the same source. [01:05.320 --> 01:13.960] For containers, this source means Dockerfile and every source code files that are referred [01:13.960 --> 01:23.320] from the Dockerfile, and the binary means OCR images, including the tar layers and the [01:23.320 --> 01:26.520] metadata JSONs. [01:26.520 --> 01:37.160] This reproducibility has to be attestable by anybody at any time, but not necessarily on [01:37.160 --> 01:45.600] any machine, because typically your machine has to have a specific version of tool chains, [01:45.600 --> 01:52.960] and sometimes you have to use a specific version of the host operating system, and with a specific [01:52.960 --> 02:01.400] file system, and with a specific CPU, so this limitation is very far from ideal, but this [02:01.400 --> 02:06.560] is sometimes inevitable. [02:06.560 --> 02:11.120] So why do we need reproducible builds? [02:11.120 --> 02:17.160] It's because we want to verify the actual source code of the binary, not the claimed source [02:17.160 --> 02:19.240] code. [02:19.240 --> 02:27.040] The actual source code may differ from the claimed source code when the build environment, [02:27.040 --> 02:34.200] such as the developer's laptop, or the city server, such as Jenkins, or the action is [02:35.040 --> 02:43.920] compromised, or when the developer simply has malicious intent. [02:43.920 --> 02:50.480] So we want to be sure whether we have the actual source code, and if the builds are [02:50.480 --> 02:56.560] reproducible, we can be sure that we have the actual source code. [02:56.560 --> 03:02.800] Otherwise we are not sure whether we have the actual source code or not, and maybe we [03:02.800 --> 03:09.520] are using some compromised source code. [03:09.520 --> 03:17.120] So reproducible builds is really great, but it's not a panacea, especially reproducibility [03:17.120 --> 03:23.280] has nothing to do with whether the source code is safe to use. [03:23.280 --> 03:31.040] The source code may still contain malicious codes, so reproducible builds make sense only [03:31.120 --> 03:35.720] when you actually review the source code by yourself. [03:35.720 --> 03:46.880] So it's a very time-taking job, and very few people are motivated to bother doing that, [03:46.880 --> 03:51.680] but this problem is beyond scope of my talk. [03:51.680 --> 03:59.800] So maybe this task can be automated using some AI in the next couple of years, but it's [04:00.040 --> 04:01.760] beyond scope of this talk. [04:01.760 --> 04:05.760] I don't know. [04:05.760 --> 04:16.400] And it was hard to make the builds reproducible, especially with Docker files, so there were [04:16.400 --> 04:19.840] three major changes. [04:19.840 --> 04:30.840] The most obvious one is time stamps, such as the time stamps of the files in OCI TAR [04:30.840 --> 04:43.080] layers, and other time stamps in OCI sessions, such as the OALJ OpenContinent.images.created. [04:43.080 --> 04:51.920] And we also have time stamps in the image histories, so it's going to be printable with [04:51.920 --> 04:54.640] joker history commands. [04:54.640 --> 05:01.680] So the time stamp problem is the most obvious one, but the time stamp problem is relatively [05:01.680 --> 05:04.120] easy to solve. [05:04.120 --> 05:08.680] So the biggest problem is not time stamps, and the biggest problem is non-determinants [05:08.680 --> 05:11.080] of after-git. [05:11.080 --> 05:18.400] So when you run after-git, the package version that is installable with after-git changes [05:18.400 --> 05:20.240] every time. [05:20.240 --> 05:23.120] And of course, this is not specific to after-git. [05:23.120 --> 05:35.120] So the same problem exists in DNF, APK, ZIPR, Pac-Man, and almost all package managers. [05:35.120 --> 05:42.960] Actually, NICS, Partition Manager, has solved this issue with a branch-on-paying system [05:42.960 --> 05:51.560] called Flix, but NICS is very complex and still hard for most people to run. [05:51.560 --> 05:59.440] And NICS is also similar, but NICS is still complex and very hard, so most people still [05:59.440 --> 06:05.800] want to use after-git or DNF or APK. [06:05.800 --> 06:15.720] And the third problem shown here is characteristics of the file systems, such as hard links and [06:15.720 --> 06:17.720] X attributes. [06:17.720 --> 06:32.320] So these special characteristics may differ across file systems. [06:32.320 --> 06:42.880] So re-positional builds were really hard in the ecosystem of ZOKA file, but it's now [06:43.120 --> 06:48.560] supported in Build-Git version 0.11. [06:48.560 --> 06:57.080] Build-Git is a modern image-building framework made for ZOKA and MOBI, and it has been embedded [06:57.080 --> 07:02.760] in the ZOKA demo since ZOKA version 18.06. [07:02.760 --> 07:08.840] But it's not specific to ZOKA and MOBI, so it can be also used as a sound-alonged demo [07:08.840 --> 07:18.360] called Build-Git D, and Build-Git D can be executed inside Q-Valentice or NADCTA or POTMA [07:18.360 --> 07:24.280] or any other control engines that support OCI specs. [07:24.280 --> 07:32.920] Build-Git version 11 was released the last month with the built-in support for reproducing [07:32.920 --> 07:40.840] time stamps, thanks to Tony Stiggy for the contribution of this work. [07:40.840 --> 07:52.160] And this version 0.11 still needs really complex ZOKA files, but the next version 0.12 is likely [07:52.240 --> 08:04.880] to require less complex ZOKA files. [08:04.880 --> 08:13.880] And reproducing time stamps is supported using a special build org called Source-State-Epoch. [08:13.880 --> 08:22.560] This build org conforms to the reproducible-builds.org's Source-State-Epoch spec, which is available [08:22.560 --> 08:24.200] under hdbs.com. [08:24.200 --> 08:30.440] The reproducible-builds.org spec's Source-State-Epoch. [08:30.440 --> 08:38.200] And argument value is usually expected to be set to the unix epoch representation of [08:38.200 --> 08:45.920] the git commit dates using git log dash 1 dash dash pretty equal passivity. [08:45.920 --> 08:59.960] So you get an integer number that corresponds to the seconds since 1970, generally first. [08:59.960 --> 09:07.880] The Source-State-Epoch is exposed to the run instruction of the Docker file as the environment [09:07.880 --> 09:15.840] is variable, and in addition, it's also consumed by build git itself for the time stamps in [09:15.840 --> 09:19.440] the OCI JSONs. [09:19.440 --> 09:26.920] But not for the time stamps in the OCI, not for the time stamps in the OCI Tyrayas and [09:26.920 --> 09:31.360] build git version 0.11. [09:31.360 --> 09:40.560] This is planned to be improved in version 0.12. [09:40.560 --> 09:47.920] So as I mentioned in the previous slide, there is a bunch of capabilities in version 0.11. [09:47.920 --> 09:57.400] So especially the file time stamps currently has to be explicitly touched with using the [09:57.400 --> 10:08.400] find commands, XRs commands, and the touch commands like this very complex script. [10:08.400 --> 10:13.920] That already takes more than three lines. [10:13.920 --> 10:22.080] And also, you have to squash all the layers to eliminate all the fs files that are created [10:22.080 --> 10:31.680] on removing the files in the containers because the time stamps of the whiteouts are not reproducible [10:31.680 --> 10:35.600] in build git version 0.11. [10:35.600 --> 10:42.040] And also there's a restriction on the mount point trajectories. [10:42.040 --> 10:50.680] So cache mount points can be only created under TMP fs such as thrush div. [10:50.680 --> 10:58.800] And also hard links are not reproducible depending on the file system stamp shooter. [10:58.800 --> 11:05.520] So in this version, we still have a bunch of capabilities, but these capabilities are already [11:05.520 --> 11:13.560] being improved in my product list 3560. [11:13.560 --> 11:21.320] It's not merged in the master branch yet, but I hope that this product will be merged [11:21.320 --> 11:33.720] in the next version 0.12 in the next couple of weeks or maybe in the next couple of months. [11:33.720 --> 11:37.520] The next topic is reproducing package versions. [11:37.520 --> 11:41.080] This is the most important topic of this talk. [11:41.080 --> 11:48.120] The package versions are hard to reproduce because most of the distributions do not retain [11:48.120 --> 11:50.680] all the packages. [11:50.680 --> 11:57.520] For example, Ubuntu does not retain all the packages as far as I can see. [11:57.520 --> 12:03.600] DBN does, but the package archives are not mirrored widely. [12:03.600 --> 12:13.080] And basically we only have the central snapshot.dbn.orz and only a few mirrors. [12:13.080 --> 12:22.120] This is causing too much load on the central server snapshot.dbn.orz. [12:22.120 --> 12:29.720] So basically this snapshot.dbn.orz server can't be used in the CI environments because [12:29.720 --> 12:34.320] it's really slow and it's really freaky. [12:34.320 --> 12:39.640] And this slowness and the freakiest problem will get even worse when more people begin [12:39.640 --> 12:44.120] to make their bills reproducible. [12:44.120 --> 12:54.120] This situation is very similar for Fedora and Arc Linux as well. [12:54.120 --> 12:58.160] And reprogate is my solution for this problem. [12:58.160 --> 13:08.200] This is a decentralized and reproducible front-end for Aftergate, DNF, APK, and Parkmar. [13:08.200 --> 13:13.680] The package version can be locked with SHA256 sums file. [13:13.680 --> 13:25.800] And packages can be fetched from several transports such as HLGP, OCR, OCR3s, IPFS, local five [13:25.800 --> 13:30.520] systems, and NFS. [13:30.520 --> 13:38.760] By default, reprogate attempts to fetch the packages from dev.dbn.orz using the package [13:38.760 --> 13:40.760] name. [13:40.760 --> 13:47.080] The dev.dbn.orz server is fast, but it's ephemeral. [13:47.080 --> 13:50.360] It doesn't regain all the packages. [13:50.360 --> 14:05.720] So for all packages, reprogate automatically forced back to dev.n.set.fr using SHA256 hash. [14:05.720 --> 14:16.680] This is relatively slow, but this server provides persistent snapshots of all the packages. [14:16.680 --> 14:26.920] You can also configure reprogate to use OCR3s, IPFS, and local five systems. [14:26.920 --> 14:35.240] Reprogate currently supports the five distributions, dev.dbn.orz, Fedora, ArcBind, and Arc Linux. [14:35.240 --> 14:41.680] Reprogate is expected to be used in continuous, but can be used with noncontinental environments [14:41.680 --> 14:44.440] as well. [14:44.440 --> 14:46.720] The command user is like this. [14:46.720 --> 14:53.280] So you run reprogate hash generate to generate the hash file, and run after get install hello [14:53.280 --> 15:02.400] to install hello packages, and reprogate hash generate again, and you will get SHA256 [15:02.400 --> 15:16.080] stamps file like this. [15:16.080 --> 15:25.280] And inside the containers, you can run reprogate install with SHA256 stamps file, and reprogate [15:25.280 --> 15:34.800] this package from HTTP after get to repo, or maybe from OCR3, or maybe from IPFS, or [15:34.800 --> 15:40.720] maybe from NFS, depending on the configuration. [15:40.720 --> 15:43.160] And here is the demo. [15:43.160 --> 15:51.720] So to reproduce this demo, you have to run specific version of build kit, version 0.11.0. [15:51.720 --> 16:05.760] And in this directory, I have SHA256 stamps file like this. [16:05.760 --> 16:13.560] This is mostly for running GCC. [16:13.560 --> 16:20.880] And Docker file is a country really complex, it's machine generated, and it has a bunch [16:20.880 --> 16:48.320] of workarounds like this for sausage epoch stuff, and you can use this to test reproducibility. [16:48.320 --> 16:55.320] This takes a few minutes, but the result is like this. [16:55.320 --> 17:09.760] So you will get the same fast 0, AS3BC, FEB67C85 on any machine, such as on DTIB actions, or [17:09.760 --> 17:19.080] local laptops, so you can try this by yourself on your own machine. [17:19.080 --> 17:25.200] And the future works includes simplifying Docker files and cache management. [17:25.200 --> 17:33.920] I'm also trying to implement with 20-stages xxopt and xxapk for cross-compilation. [17:33.920 --> 17:41.560] And also, reproducibility should be testable with SSF, such as provenances, ideally just [17:41.560 --> 17:47.960] with a single click, and probably more contribution is welcome for these topics. [17:47.960 --> 17:51.800] And here is the wrap up of my talk. [17:51.800 --> 17:59.720] So reproducible build helps testing the true origin of the binary, and challenges like [17:59.720 --> 18:06.600] non-deterministic timestamps and partitions, and basically the partitions 0.11 adds programming [18:06.600 --> 18:14.120] and support for source data epoch, and the reproducibility can be used for reproducing [18:14.120 --> 18:19.000] the partitions with 5, 6 sums. [18:19.000 --> 18:25.560] And I think, sorry, the demo is still running, so I can't show the result of the demo, but [18:25.560 --> 18:30.880] that should be like this result. [18:30.880 --> 18:38.360] Any questions? [18:38.360 --> 18:43.960] Would it be fair to say that this sacrifice is security in favor of reproducibility, because [18:43.960 --> 18:48.720] you would have to keep that list of hashes maintained to make sure that the packages [18:48.720 --> 18:52.040] downloaded are always like the most secure ones? [18:52.040 --> 18:56.240] So your question was how to make these hash files, right? [18:56.240 --> 19:03.720] How do you make sure the list of package hashes is always pointing to the most secure versions [19:03.720 --> 19:07.240] of a package? [19:07.240 --> 19:16.000] So you can use replicate hash data command to scan installed packages, and make the hash [19:16.000 --> 19:26.520] file like this, but you can also create a hash file by yourself, by just with text editor, [19:26.520 --> 19:33.360] or maybe just your own NSR tool to maintain this hash file. [19:33.360 --> 19:37.960] Okay, we're out of time. [19:37.960 --> 19:40.000] Thank you for the talk. [19:40.000 --> 19:41.000] Thanks everyone for attending.