Hello everyone. Let's talk about building in and out of distribution packages. So a
little bit about us. I'm Dan. I work for the Linux user space at Tima and Mana. I'm
a system dmaco assignment trainer. I'm Zbyshek. I work in Red Hat. I work on Fedora. I'm in
Fesco. I work on system d mostly. So let's start by talking about in and out of these
and why we need them. The general boot flow when you boot a kernel is you start with a
boot loader. The boot loader goes to the kernel. The kernel is then responsible for finding
the root file system. And in the early days of Linux this was pretty easy and the kernel
could do it itself. But these days finding the root file system is a lot more complicated.
So the kernel really said we're not going to solve that problem. We'll just leave it
to user space. How do they do that? Well, you give a file system to the kernel, which
is called the inner MFS. You do that via CPIO. The kernel will unpack that and then start
user space in that temporary file system, which is unpacked in memory. And then once
you go into that, the inner MFS is responsible for finding the actual root file system and
then doing a switch root operation into it. And then you end up in the final file system.
And the inner MFS can do what it wants really, but generally these days it's done in one
of two ways. So the first one is that some bespoke bash script gets invoked, which is
generated by your inner MFS generator. So these are tools like Drakeit, like inner MFS
tools on Debian, or make it in, make in its CPIO on Arch Linux and derivatives. The other
way you can do it these days is by using systemd. So systemd has for a very long time
already supported running in the inner MFS. And it has all the tools and services you need
to find the root file system and switch root into it. And some of the inner MFS generation
tools actually don't like, are configurable, so you can either use the bash script or you
can choose to use systemd in the inner MFS. So I'll add to this that the amount of stuff
that needs to happen for the root file system to be available is growing more complex all
the time. So we have encryption, we have RAID, device mapper, possibly the invariity. And
in the theme of previous, previous talk, we for example might at some point ask the user
for a password, but the user might not be using a keyboard, they might be using a braille
device or they might need a screen reader to know that the password prompt is up. And
all of this stuff will sooner or later need to be available very early in the boot before
the system is reader. Yeah, well I'll add to that that your root file system might not
even be there on the system yet, it might have to come from the network. So you would
need all the tools to set up a network connection and everything in your inner MFS. So it can
get pretty complicated. So what's the status quo? Like I said, we have the inner MFS generation
tools like Drakeit, Makein' the CPIO, inner MFS tools. And the way these tools work is
they basically go look at your host file system, what's on there and they start picking out
specific files and use that to make the inner MFS. Which files to pick is, becomes specific
logic for each inner MFS generator. So the thing you need to know is that if you say
like include this binary on the inner MFS, that's not going to work because that binary
has library dependencies. So you also need to go get all the libraries and of course
those libraries can depend on more libraries and so forth and so forth and so forth. So
you need logic to make sure all those things get picked up correctly. Now luckily for ELF
binaries you can actually in a pretty hacky way do that by just going to look at the ELF
binary where all the library dependencies are recorded and trying to figure out stuff
that way. But then you get into stuff like DL Open where the library might not actually
be listed in the ELF binary. Or you get into stuff like configuration files or other kinds
of plugins or anything you can think of. There's no direct dependencies listed in the file
system that you can use to figure out all these things that need to be included. So
you can get into quite a few issues. So this leads to having regular packaging when a new
piece of software is released that is used in the inner MFS where the package build or
the WNDEP or the RPMs pack gets updated. And then you get into inner MFS specific packages
that you can use to do the packaging. So for example Drakeit. A very good example of this
is when we introduced systemd executor in systemd which is now required to launch services.
So this was a new binary. So when we released a new version all the specs were updated. And
then we also had to update every inner MFS generation tool to make sure to include that
binary in the inner MFS. And this leads to quite a few bugs. It also means that when
it becomes very unclear where the bug should be reported. It could either be bug in the upstream project
or it could be the inner MFS generation tool that's not correctly picking up all the dependencies
required to run the tool. So it becomes very hard to assign bugs and requires a lot of
triaging to get the bugs to the right project. It's also hard to customize. If you want to
include something it's up to you to figure out all the dependencies and specify them in the inner MFS
generation tool to be included. And of course it's also quite slow because every time the inner MFS is
updated it has to be done locally. And all the dependencies have to be figured out. And anyone that's
ever used Drakeit and not used host only mode probably knows what I'm talking about because it takes forever.
So what do we want to do instead? We want to reuse all the work that the distributions are already
doing with their packaging. So the Arch package builds, the RPM specs, everything. We want to reuse all the
work that goes into those and use those to build the inner MFS. So instead of going to look at the host file
system we just install RPMs, install dApps, install packages, Arch Linux packages into the inner MFS and we
get it out that way. And this has a few advantages. For example, package managers, it turns out that package
managers are very good at installing packages. So it just works. They're also good at managing dependencies. So all
these systems have, depending on the package manager, very extensive or at least very sane dependency
resolution. So all the dependencies get listed and the package manager takes care of figuring out all the extra
stuff that is needed and make sure that gets installed as well. You don't need to go parsing ELF binaries anymore to figure
out the dependencies of a specific package. You don't need to learn another system. So you don't need to learn the
inner MFS generation tool. You don't need to manually start listing the dependencies of the tool you want to include.
You just install the RPM, the package, the dApp, whatever you want and the package manager takes care of all the
rest. The ownership of bugs becomes clearer because the inner MFS generation tool is just installing packages. It's
pretty simple. So there's a, the surface area for bugs is a lot smaller and generally when bugs appear, they're going to be
able to be assigned to the upstream project instead of to the inner MFS generation tool. If any improvements are made to
the packaging, of course, they automatically end up in the inner MFS as well. And finally, by doing this approach, the inner MFS is
not tied anymore to the root file system or the host file system. So you can also start building the inner ID off-host on a
distribution builder and distribute it as a package. So you can just download an inner ID instead of generating one locally.
Assuming that the inner ID includes all the necessary pieces, this allows you to just have an inner ID that works for 99% of use cases
without every user having to spend CPU power to build at inner ID themselves. So there is some, some requirements are needed to build
the inner MFS out of packages. Specifically, this means that the packaging has to be done a little carefully to make sure that the
inner MFS does not become too big. For example, GCClips, so GCC ships a bunch of libraries which are generally dependent on by
software, at least the C library. But it also ships, like GCC supports the Go programming language, it supports Fortran, it
supports D, and it includes standard libraries for all of those. If those are all put in the same package, especially the Go standard
library, it's absolutely huge. So yeah, and with an inner MFS, that's pretty huge. So ideally, GCClips is a separate,
sub-packages for each standard library so that you can only install the necessary one in the inner MFS. For example, Arch Linux doesn't do this.
So we have to, you have to start removing stuff manually, but we don't want to do this, right? We want to rely on the packages.
So ideally, the distributions take a little care that the core packages are split sufficiently so that you only install the necessary stuff in the
inner MFS. Another good one is that the kernel modules generally depend on the kernel itself. So if you install the kernel modules in the
inner MFS, the kernel gets pulled in as well, but you don't need a kernel in the inner MFS. So that's another thing where there should be a
little care taken to make this possible. And finally, locales. Fedora has, and then Santos and derivatives, have a G-Lip C minimal
rank pack package that only includes the official UTF-8 locale instead of all of them. And that again, stuff like that helps to reduce the size.
So how do we propose to build this inner MFS out of packages? Well, we suggest to use MAKOSI, which is SystemDIS Image Building Tool.
So our idea is that the inner MFS really isn't any different from a regular Linux image. It's just packaged differently. Instead of putting it in a disk image
with a GPT partition table, you just package it with CPIO and you get your inner MFS. And inner MFS isn't really any different from a regular
Linux system, except it just includes less software and it has two extra sim lings, and that's all you need. So you can build it using the regular image
building tools. You don't need anything different. So MAKOSI is a tool that builds these images. It does a whole bunch of things. It installs packages,
and it can also build you something else than an inner ID. So it can install bootloaders. It can build an inner MFS for a regular disk image.
It can do unified kernel images. And it can run a whole bunch of tools that SystemD provides to configure system images. And it also allows you to test the thing
by booting it in QMU or assisting the N-Spawn container. So how do you get started with MAKOSI? Well, this is an example to build Arch,
install SystemD and the kernel. We enable autologon and then start it in QMU. This gets you something like the following.
MAKOSI supports all the popular distributions, I guess. CentOS, Debian, Ubuntu, OpenSuzi, Arch, Fedora, and some derivatives of those.
Raul? Raul. Raul BI. One interesting thing is that you do not need root privileges as your user to run MAKOSI. We use these new UID map and new GID map tools to be able to do everything
without needing to enter your password. We also use SystemD Repart from SystemD to be able to build disk images without needing root privileges or loop devices.
So you can just run all this as your regular user to build an image. We have configurations, so instead of having to specify everything on the command line,
you can also use the regular SystemD unifile, which everyone knows from Unifiles.
So what is MAKOSI in-it RD? Well, it is a MAKOSI configuration to build in-it-RAMFS images. It used to be a standalone project, but we recently merged it into MAKOSI itself.
So it is already used to build the default in-it-RAMFS for all images that MAKOSI builds. So if you use MAKOSI to build a disk image and you do not specify your own in-it-RAMFS,
it will use MAKOSI in-it RD to build an in-it-RAMFS and use that. So every time you boot a MAKOSI disk image, you are generally already using this.
And we make sure this is tested on all the supported distributions. So it initially started out as a Fedora only thing, but when we merged it into MAKOSI,
we implemented support for all the distributions. So you can build an in-it-RAMFS out of Arch packages, Shibuunter packages, Debian packages, OpenSUSE packages,
CentOS packages, or Fedora packages. We also ship a kernel install plugin. So kernel install is a system that is tooling for taking a kernel from your
slash user directory, where it is installed by the package manager usually, and moving it to the ESP and doing a bunch of extra required work,
like for example building an in-it-RAMFS. So usually like on Fedora at least, the Drakeit ships its own kernel install plugin, but MAKOSI does as well.
So you can basically configure kernel install to use MAKOSI in-it-RAMFS instead of Drakeit to build the in-it-RAMFS. And Drakeit will automatically disable itself if another in-it-RAMFS generator is enabled.
This view reuses all the package manager caches from the host file system. So you're not downloading unnecessary packages. It just reuses the same RPMs or depths that were already used, or that you already installed on your host file system.
And finally, it can be completely customized. So MAKOSI, the configuration supports drop-ins. So you can add a few of those in user-lit MAKOSI in-it-RD or ETC MAKOSI in-it-RD to add more packages to the in-it-RD or to remove some extra stuff, or anything you can think of really that is supported by MAKOSI, you can make sure that it gets applied to the in-it-RAMFS produced by the kernel install plugin.
It can also be used as a standalone thing, so you don't need to use the kernel install plugin. This is how you would use it to build your own in-it-RAMFS, which will then appear in the working directory that you invoke it in.
One interesting thing here is because the kernel modules packages aren't really set up completely correctly yet and pull in too many dependencies, we do the practical thing and we copy the kernel modules from the host.
By using the kernel module exclude settings and the include settings, we can do the same thing that RakeIt does or the other tools do, where we only include the kernel modules that are loaded on the host file system, because if we would include all of them and all of their firmware dependencies in the in-it-RAMFS, it would grow to tremendous proportions.
So make sure to only include what's needed.
So we cover a lot of this with integration tests. Specifically, we make sure that booting from LUX works, so with an encrypted root file system, we make sure that LVM works, we make sure that the combination of the two works.
This can boot up for the AOSFS. We support the system-dgpt auto-generator stuff, just specifying doing everything with FSTAP, whatever you can think of really.
We try to make sure it works. There are some more niche technologies like RAID, NFS, and iSCSI that we haven't had the time to write integration tests for, so we can't say for sure that this will work, but we're working on making more stuff work that is already possible.
So we're working on making more stuff work with the existing tools.
That was everything I had to say. So this is a link to the configuration files from iSCSI in-it-RD. So there you can go and take a look at how the in-it-RDs are structured, which packages are included, what files are removed.
Specifically, there's a lot of files that we have to remove depending on the distribution. So any distribution packages, go look at that, see what we have to remove manually, and improve your packaging so that we don't have to do that.
Thank you for listening.
So before the questions, I want to make one comment clarification. Since we're developing this, we get into this mindset of thinking about the low-level details, but I think that this might be a bit confusing, that on the one hand, we talk about building the in-it-RD in a
predictable way, somewhere in central infrastructure, and signing it. And on the other hand, we talk about including local modules. And so a lot of this stuff is for development and for now.
And in the long term, we want to have the centralized thing where we're building the in-it-RD, glue it together with the kernel, and sign the pair together, building a unified kernel image, which Leonard Pottering was talking about earlier today.
So yeah, just to clear this up.
Awesome, thank you. What questions do we have? One over here, one over there.
Okay, so you mentioned that currently you use local modules. So it doesn't mean that all the complexity from record for selecting kernel modules still remains here as well, right?
Yes, but it turns out the complexity for selecting kernel modules, because the kernel modules list their dependencies properly, is not all that much. But yes, we do support it. But we hope, like Shabish said, that eventually in the future, we don't have to use that part anymore.
So we can have a proper set of default modules, and these are all properly sub-packaged in distributions, so that we can install distribution packages to get the kernel modules instead of having to do the extra complexity for selecting them locally.
You spoke about integration testing on multiple distributions. Did you try to test in all, let's say, usual kind of lattice distribution, but did you try a bit older, and do you have something that you plan to maintain like testing with a new distribution that are coming?
So at the moment, our integration tests are run for the default versions of all the supported distributions. So this is generally the latest. It's Debian testing, it's not Debian stable. But I mean, we could definitely add more. It's just running in GitHub Action, so it's just a matter of defining the necessary configuration and then we can run tests for everything.
What was those questions? Zero questions. Alright, thanks you too. This was great.