So up next is 3.
I hope that's reasonably correct.
Yeah, that sounds right.
Of linker thing, let's just say that.
Yeah.
Now talking about whether the mold linker can actually
be used as a system linker.
Yes.
So thank you for coming to this talk.
My name is Rui Uyama.
So I'm the creator of the mold linker as well as the LLVM
linker.
So I wonder if you guys are using my linker.
So raise your hand if you are using mold linker.
And what about LLVD?
OK, maybe almost everyone is using my linker.
So it makes me very comfortable to be here.
Anyways, so the mold linker is my latest attempt
to create the best linker for developers.
And that really matters because in most compilations
and build times, linker dominates, especially
if you are doing a quick edit, debug, compile cycle,
because you edit a single file, build a thing.
The compiler finishes pretty soon
because it compiles just a single file.
But the entire executables need to be built from scratch.
So the link time matters.
So I've been developing the mold linker since September 2020.
So it's been almost three years under a little.
So it's relatively new.
So it's available under the MIT license now.
It's been under a different license
because I was trying to commercialize it.
But it turns out that it didn't work out.
So I decided to go with the published license.
And the main purpose is to offer the fastest linker
to that developer.
So it's order of magnitude faster than the new linker.
And it's also faster than my previous one, LLVD,
as well as the new gold linker.
So as a rough to give you an idea,
the on a decent multi-core machine,
mold can output one gigabyte output per second.
So if your executable is two gigabytes,
and then it takes two seconds on your machine.
And that's pretty fast.
But the modern executables are gigantic as well.
So for example, if you build LLVM with debug info,
the output would be like one and a half gigabyte.
But it can be built in one and a half seconds.
And the mold linker supports almost all major targets,
except MIPS.
And the reason is because MIPS, ABI,
has diverged too much from the other ABI's.
The fact is that the other ABI's have evolved
since 2000.
But the MIPS ABI has stagnated since the collapse of SGI,
because SGI was a de facto player in that field
to set the standard.
And then no one has since then made any effort
to improve the ABI.
So MIPS has diverged.
So at this point, I'm not sure if we want to work,
continue working on MIPS support,
because it seems like no one is really making
a serious effort to refresh the architecture.
But anyways, it supports a lot of architecture,
even including long arch, which is a newcomer in this field.
And despite being pretty new,
I think that the linker is production ready.
And I think that many people are actually using
for production use.
I will talk about that later,
how I tested the linker.
So from the developer's perspective,
so this slide explains what is the model linker
from the developer's perspective.
So it's written in C++,
specifically with C++ 20 features,
and with Intel TVB as a 3D library.
And the one thing that you would notice immediately
if you take a look at the source code of model linker
is that almost all functions and the data structures
are templates rather than just plain functions or structures.
And the templates are specialized for each target.
So for example, if you,
so we have,
and the source code quality,
and ideally have readable source code.
So I put a lot of efforts to make it readable.
So this is an example of how you write
target specific code in mold.
So it uses if constexpr in the source code.
So if you are not familiar with C++ 20,
this is a feature, this is a new feature.
And the beauty of this feature is that
if constexpr is evaluated at compile time
rather than runtime,
so this if constexpr expression will be compiled to nothing.
If this function will not be specialized for PowerPC 64,
V1.
So if as long as you got your new code in this way,
your new code cannot do anything harmful for other targets.
And it cannot be,
it cannot slow down other targets.
So this is another example how we use C++ 20 feature in mold.
So this is a data structure representing
on this format of relocations.
But there are many types of relocations
because we at least have big Indian,
little Indian 32 and a 64 bit version.
So in combination we have already four different versions.
And the beauty of C++ 20 is that you can use
a require your crowds after the template keyword
to specify what kind of type parameters
that you wanna specialize for.
So in this case,
this data structure is specialized for middle Indian
and real way of which is very technical stuff.
But we have two different versions
of relocation data structures.
And below the definition,
we have different versions of data structures
of the same name.
And we even have completely different version
of data structure for specifically for Spark 64.
Because Spark 64 has this weird field
that doesn't exist in any other architecture.
So, but we can just define
this data structure only for Spark 64.
And as long as you guard G code
that access this field with if course expert,
then your code will not be
cause GM, you know,
you are using the missing field of the data structure.
So this is a very beautiful way to compile
your code to a specific target.
So,
it's not loading.
Okay, so this is a machine description
of the of G some specific target.
In this case, it's a machine description for x86 64.
So we have bunch of constexpr static variables
as a parameter.
And it defines, you know,
that whether it's a middle Indian architecture
or big Indian architecture or it's 32 bit or 64 bit.
And basically you,
so if you wanna put the mold link
to new target,
then you define this kind of data structure
where basically copy and paste.
And then make the modification as you needed.
And then it's just as simple as that.
And since this is G's fields are compile time constant
so the compiler knows what the value is
at the compile time so they can optimize code
based on these values instead of,
you know, that dispatching at runtime.
So this is a comparison of the number of lines
that you need to put more linker to the new target.
So on the left hand side, we have code.
So it is not a really precise comparison
because lines of code is not a direct indicator
about how easy or how hard it is to put linker
to the new target.
But it gives you enough idea about the scale of you,
about the amount of work that you have to do.
So apparently for gold,
you have to write tens of thousands of lines of code
for each target.
But the reality is most code in the target specific code
for gold are just a copy paste.
So for example, if you wanna put new gold to like spark
or long arch or whatever,
then you would start copying the entire file
as long arch dot cc or whatever
and then it make the modification.
So you have a lot of copies of code
and that's not a really good way to, you know,
put that thing to the new code.
And on the other hand, we have very little code in mode
to put to the new architecture.
So we have a few, we have some amount of code
outside of these files for target specific architect code
but overall the amount of code is very, very small,
like only a few hundred lines of code.
So testing, testing is the most important
and the difficult part of writing the linker
because as you know that if you write a simple linker
it's not really hard because it's just a program
that takes object files and combines them
into a single executable or shared object file.
But the thing is there are so many edge cases
and because there are like hundreds of thousands
of programs that uses the linker,
essentially every program uses the linker.
So every corner case will be,
there is some use case of corner cases out there.
So testing is very hard.
So we have two tests of how to say the mode
to ensure that you, I will be finding a bug
before you will notice in the production use case.
So the first test is shell script based test
which is a very simple test.
I have a slide, slide for this.
So this is just a test case for the very simple test case.
So we actually compile code and try to link
the object file with mode
and then actually execute it on the machine.
And as you can see that if you have a cross compiler
and the QMU, you can test that this test
for other architecture that's different from the one
that you are running on.
So for example, you can test Spark 64 on x86 machine.
But apparently this test is not enough
for real use cases, right?
So the other test that I was doing,
I'm doing is to try to build all gentry packages
in a business mode in a Docker container to find any bugs.
And the beauty of using gentry is that with gentry,
you can use the exact same command to build any package.
And it can also run the unit test
that comes with the package.
So it's very easy to wait to test
whether you can build the program
and the build program will work or not.
So I did that and it takes a few days
on the 6C4 core machine.
But it works.
But the thing is it is sometimes extremely hard
to debug the stuff when something goes wrong.
But somehow I managed to fix all bugs
that I found this way.
Well, yeah, it was a fantastic experience
to fix all the jits bugs.
But my point is that it is very important
to fix all bugs before you would notice in the world.
Because if mold didn't work out of the box
for your project, the next thing you would do
is just switch back to the original linker
and you will never try it again with the mold linker, right?
So why mold is so fast?
Well, so we use multiseletting,
multiselet parallelization from the beginning.
So that's essentially why mold is so fast.
But the other thing is that mold is simply faster
than the other linkers with single-slated case
is sometimes because we are using optimized
data structures and code.
Actually, the data structure is more important than code.
As Rob Pike once said that you would write code
around data structures and not to other ways.
So designing the right data structure
is important to make faster program.
So here is, I think, a good visualization
of how good mold linker is to use multi-core
all-G cores available on the machine.
So on the left-hand side,
LLD fails to use all-G cores,
but the mold finishes very quickly with all-G cores.
So why, but the question is,
would be why do we want another linker
even though we have LLD?
So my answer is, so LLD is not known, first of all.
And the other thing is that LLD does not stop
or support GCC LTO.
So LLD is actually tightly coupled
to a specific version of LLVM.
So LLD, for example, version 15 can do LTO
only for LLVM 15.
So it of course cannot handle any GCC LTO object files.
So if you wanna do LTO with no faster linker,
then mold is the only viable option.
So what about Gnu Gold?
I think the problem with Gnu Gold
is the lack of clear ownership.
So it looks like it's not really maintained well anymore.
And the original creator of Gnu Gold, which is Google,
has lost the interest of keep maintaining it
because they are now switched to LLD.
So I think the future of Gnu Gold is not clear.
So and the gold is not as fast as my linker too.
So can we improve Gnu LLD
so that Gnu LLD gets as fast as my linker?
My answer is no.
I think that it's almost impossible
to make the thing faster
unless you rewrite everything from scratch.
And if you rewrite from scratch,
that would be the same thing as I did.
So and in my opinion, the source code of Gnu LLD
is not very easy to read.
It's like the source code was written more than 30 years ago
and it's been maintained since then.
But people are still adding new features to Gnu LLD first
and then put to other linkers
because what they are actually using is the other linkers.
But I think that the situation is silly
because people do not really use Gnu LLD anymore
for their real world project.
So I think that it needs changing.
And my question is do we wanna stay with Gnu LLD,
the current Gnu LLD forever?
My answer would be I don't think so
since we have a good replacement.
So if I can, I'm open to donate more to Gnu project
so that we can call it a Gnu mold
if that accelerates that option.
It's not something that I can only decide
but because it means a lot
but I'm open to that option if it makes sense.
So the death missing piece to use mold as the standard linker
is the kernels and the embedded programming support.
So user and the programs are mostly fine.
Well, if you install more as a system linker
you wouldn't notice any difference other than speed.
But the kernels and the embedded programs needs
more special care about memory layout
because hardware for example,
enforces you to put some data structure
or code at a very specific location of the memory.
And if you are programming against MMU this computer
then you wanna layer as the hardware memory is.
So that kind of stuff is usually handled
by linker script as you know.
But the linker script in my opinion has many issues.
The first thing is that it doesn't have any
formal specification of the language.
It only has the manual and we implement to,
so other linkers are trying to mimic the behavior
of Gnu LD but it of course causes compatibility issues.
And the other thing is that the linker script
predates elf file format.
So not all linker script command can translate
directly to elf terminology
and it causes more confusion than necessary.
So, and I think that it is almost impossible
to add a linker script support
without slowing down the linker.
So I think that we need something better.
So this is my current approach to support
embedded programming and counter support.
So I added a very simple command line option
which is called section order.
And that specifies them how to layer the things.
So, and I think that this option alone can satisfy
like more than 90% of the usage
but I'm pretty sure that that doesn't cover
all the usage of linker script.
So I need a help from you guys.
So because especially in embedded programming world,
their programs are not open source
and they are not available on GitHub
and they tend to be in house program.
So I don't know what the real usage is for embedded programs.
If you can tell me that I wanna do this
with the mold linker, then I can implement that for you.
So I would appreciate it if you give me a hint.
All right.
So this is the end of my slides.
Thank you very much.
So you mentioned that it's possible to do link time optimization, like as a feedback
in the GCC, but in general, is it also possible, how easy is it to do link time optimization
inside the linker, like is it possible for the linker to disassemble some instruction
and try to put something else there?
Okay, so the question is how easy it is to do something like link time optimization
but not quite there. So I don't know if I correctly understand your question, but it's...
It's basically optimizations during the linking.
Yeah, of course, but the thing is...
It's not by the compiler, it's all LTO, but it's not by the compiler.
So the way how LTO works in the linker is compiler emits.
So from the user's perspective, all you have to do is to add hyphen FLTO to the command line option
to compiler and the linker, and everything works automatically.
But behind the scenes, the compiler emits intermediate code instead of the actual machine code to the object file,
and then the linker recognizes that intermediate code.
And then it calls the compiler back end to compile all things once to the single object file,
and then the link continues as if that gigantic single object file were passed to the linker.
So in that sense, you can do anything with the intermediate file inside the compiler back end
because the linker doesn't really care what is going on behind the scenes.
So, well, does that answer your question?
Yeah, so you said that you tested more against being all of the factors in gender Linux.
How long did that take? How long does one count take?
So how does it take to test all gender packages against more the linker?
And it takes, if I remember correctly, three, four days on my 64-core machines,
64-core machine with 200 gigabytes, 256 gigabytes memory.
And yeah, it's a very long time, but it's definitely doable on a beefy single machine.
One target?
Only for x86-64 because in order to cross-compile everything to different architectures and run-g test,
you have to do that on QMU, which slows down like 100 times than the real performance on the computer.
Yeah.
Yeah, I can't.
Yeah, sorry.
What kind of mistakes did you make in LLD that you're fixing in mode?
And are there any mistakes in mode that you think are interesting?
So the question is what mistakes did I do in LLD that I fixed in LLD?
And did I make any other mistakes in mode?
That's a good question.
The first thing is the relocation processing in LLD wasn't as good as mode.
So it's complicated.
It's hard to maintain, and it's slower than mode.
So I fixed it.
And the other thing is that LLD uses templates to support L6432, big-endian, little-endian,
but it's just four instances.
So it doesn't instantiate for each target.
So you cannot use the technique that I used for Spark 6c4, that I showed you on the slide, for example.
And did I make any mistake in mode?
Maybe not.
I am pretty satisfied with the quality of mode.
I think that I really made...
I'm personally enthusiastic about the code of the readability.
So I tried to make the source code as readable as just like a book.
And I don't know if I could achieve that goal, but the point is that, well, yeah, it's definitely readable.
One last question.
Are there any plans to ever support any order of that file that helps?
Oh, so the question is, can you support other file formats?
No, I'm planning to ever do that.
Oh, do I have an plan to support other than LLD?
Well, I did for macOS, which is a Unix-like environment,
but it uses a different file format, which is called macOS.
Yeah, but the thing is, and I succeeded to create a faster linker for macOS,
which is much, much faster than the upload linker.
But the thing is, last year in September, they released Xcode 14 with their own new linker.
So there wasn't going on efforts within Apple that I wasn't aware of.
And then their new linker is as fast as mine.
Maybe they wrote my source code as well, because it's available online.
But also, GTIB3, then?
Oh, my linker is now available under the GMIT license.
So it's, yeah.
So maybe you only heard Apple.
Well, Apple haven't released their source code yet.
So, okay, we have to stop.
So thank you again.