Okay.
Okay. I'll begin.
Now that's some oxygen.
Back in the room. Okay.
Hi. So, compilation databases.
Going to talk a bit about that.
Here's the motivation in examples.
So, we have some really simple project structure here,
your typical project, I don't know.
And you have a file that you compiled,
file A dot CPP, and here are some flags that you
used to compile that file.
Really simple, right?
Now, the question that the compilation database
kind of wants to answer is,
which flags would you give
your static analysis tool or whatever tool you're
using for file A, for example,
and also a bit more difficult to answer for file B, right?
You really don't know which
flags you should use just by looking at that.
You as a human might know,
but as a two-chain difficult to answer question.
And that is where compilation database
really comes into play.
Mostly, you notice that when something doesn't work.
So, you do some static analysis.
So, I tend to use CMake,
although I heard that there are not many fans around.
It works for me.
And Ninja, all of these, they can do that.
Like I think for me, you need some script.
If you don't, if you're built with
systems, some AK stuff,
you can still use beer or beer.
Beer you can also use.
Maybe it's better choice with Bear.
You can do some LDP load magic,
I think, and then intercept the course to your compiler
and then write out the compilation database for you,
even if your system doesn't support it at all.
And also, Clang itself can kind of emit stuff
that you can put in a compilation database with the dash mj flag.
But on the other hand,
if you do that, then you already have the flags, right?
So, if you pass that flag to Clang,
then you already have the flags.
I don't really see a reason why you want to do that,
like only for some with script situations.
Now, the problem of course is you only have
the four compiled files, right?
For header files, you typically don't compile them.
So, you don't have entries in the compilation database for you,
header files or for some other auxiliary files or
files that you didn't compile yet, stuff like that.
The file that you probably all have seen is
the compile commands JSON file, right?
That lives somewhere in your build directory.
It has entries like these.
So, there's the build directory is one of,
so there is one file right now in
this compilation database that is in the compile commands JSON.
The command is there.
You have which file was actually compiled here,
so the file a.cpp and which is the output file.
So, in this case, it's the main file.
So, this compile command is just one representation of
the compilation database that lives on your hard drive.
In Clang or better in Clang Tooling,
this is where the compilation database
struct lifts or I think it's a class,
but it's slightly simplified.
It's actually pretty simple.
So, there are some methods for
getting some compilation database.
Some methods for loading them.
So, you give it for example some direction and say,
hey, give me the compilation database that corresponds to
this directory or this file.
Or you say, I have this directory.
Please figure out which compilation database I should use.
This is what Clang also internally uses.
These are more interesting for us.
There are two things that can give you
so-called compile commands.
You can either get all of them,
all of them that are contained in
your compilation database,
like the files that you compiled,
or you can ask it to give you
compile commands for files
that you want a compile command for.
A compile command is also pretty simple.
It's basically what you just saw in the JSON file.
It's a directory, the file name.
It has the output file name.
It has the command line as a vector of string.
It parsed already the command line into arguments,
which is actually pretty interesting,
because this is pretty fast.
I started looking into this because ComptiB,
which I previously used for doing this stuff,
it was really, really slow in doing exactly this step,
parsing this and escaping
the bash arguments into a string.
There's one thing that is called heuristic.
Heuristic, that sounds promising.
You use some heuristic to figure out
some flags that you don't really know yet.
What does Clang actually do to give you compile commands?
There's this JSON compilation database,
which is just a specialization of
the normal one that we just saw.
It can load the JSON compile commands,
JSON file, and then it's a compilation database.
It does some steps.
First, it will do stuff like this here,
expanding response files with just irrelevant here.
But what it will also do is it will infer
the missing compile commands,
and then it will also infer the target and driver mode.
What it also does is it loads
some plug-ins which is not showing here.
So what is actually the heuristic that is used by Clang?
Because you may have noticed that you can
open with ClangD,
use some other toolings on a header file,
and will not immediately break.
It will try its best to figure out the flags that you're using.
What it does is it has this interpolation compilation database.
What it does, it finds the closest available file in
your database that is already there,
and then it awards points for path similarity.
So it will use the base name.
So for example, in my example here,
this file a.cpp and file a.h,
they have the same base name,
so this is already a point.
Then you have also in the local structure of your project,
you have some path and it will match on that,
and depending on how similar the path is,
it will award points.
Then if you still have a ties,
then it will use the prefix links as a tie break.
Then in the end, it will replace the file name,
and also remove the output argument
to for example, work with header files.
So now if you use that,
this is the situation where it gets frustrating,
you apply that now and you get some weird flags.
Some that you didn't expect and didn't work.
What happened? We have this other directory down there,
still in gray. Some files list in there,
they match better than our obvious file a.cpp, bad luck.
For me that in other,
this is often a copy of LLVM and that matches a lot.
For me that happens a lot.
Good. So now we know what the problem is and what the solution might be.
Let's start building our own CB.
But first, I will tell you why you might not be doing this.
If you are happy with the default,
obviously, you don't need to put any effort in, right?
It works for you. Perfect.
If your structure is simple of your project,
don't put any work into it.
Also, if you cannot build your tools,
like if you cannot build your Clang Clang D whatever,
and you rely on some fixed version that you get from packages or something,
also cannot use that because you have to link it.
Also, if you're using Clang D,
if you use Clang D,
then chances might be that you can get around with
modifying Clang D's very good configuration file.
It is pretty simple, but you can get around in most of these problems.
If you can just use some script or some hack, do that.
I was telling our working student that I was going to give this talk and he was like,
Pascal, what? I just use bash and replace that.
Why would you do that?
If you can do that, fine.
But why would you want to do that?
The number one reason I would say is you have way more information about
your code base than anyone else.
There are tons of discussions online in Clang related like
bug triggers and so on where people come up and say,
why don't we enhance the interpolation in this and this way?
Then some other person comes along and says,
yeah, we cannot do that because that wouldn't work for this and this cases.
So you have all of the knowledge and you don't have to be very conservative.
The people who write Clang D and Clang, they have to be.
Also, as I said, it's I think a very nice step into working with tooling.
Also, if you're doing something really unconventional,
I don't know, some live building or some obscure compiler that you use,
also might be worth looking at this.
Okay, so let's build our own CDB and
something that is a bit more advanced but that we can still now understand,
even for the purpose of this demo.
That also uses part of Clang D's infrastructure and
is also somewhat useful, right?
So we want something.
What you could do is we could just use an include graph.
So we have this like here, say A includes the header file and so on.
And this is some useful information.
We can generate this information.
We also have some additional information, for example, that file A and
file B live in the same direction.
We did also simple information that we can just use.
Best thing now is that Clang already has information about that and
tools to give you to help you.
Not going to go into all of these, but
there is a nice scanning tool that scans for dependencies of file.
You just keep it a file and say, hey, what are its dependencies?
And it will give you a list.
Like this is perfect.
This gives you, that works with.
So this tool, dependency scan is simple.
But this is the part obviously where it could get very
complicated and work intensive to find a good thing that does best the candidate.
Right?
I did this graph thing now, but you know code better.
So that's basically it.
So how you can use that?
In Clang D, it's pretty simple.
You can just use a compilation database plug-in,
which is a very thin wrapper around this what we just saw.
In Clang, you have to manually overwrite it.
There are two ways to do that.
Either you just change the code, just replace the class name, or
you can do some linking, I will show you where.
For the other tools, as far as I know,
you have to do the same as with Clang D and replace it manually,
which is a bit unfortunate.
I think one can get it to work with all of the extra tools that are not Clang D.
You can also add your own plug-in system.
So compilation database plug-in is actually just this,
which consists of this one method here that says load from directory.
And you take this and we can build our own one.
So we have our custom compilation database plug-in now.
We have this load from directory.
We append this well-known path, easy.
And then we just instantiate our class and return it, done.
Very simple.
And there's one extra step that might be a bit tricky.
You need to link that in, and for that you have to have this static variable created.
The name is not very important, but you have to do that.
And it uses an LLVM registry under the hood, pretty nice.
Yeah, with Clang, I'll upload this slide so you can take a look.
You have to modify either of these here, which is a bit tricky.
So the better way would be to use the plug-in system if you have your own tools.
Just directly taken from Clang D, this works for all the tools.
And if you just want to set your compilation database, you have the default ones and
you can override it with your own plug-in then.
Okay, so there was a lot of code.
I will upload this, of course.
And there's also a demo project online if you want to take a look at that.
If you are a beginner and you're just taking to,
wanted to look into that, because I look at different LLVM code normally and
not at this part of LLVM project.
I had some nice experience with, actually with LLVMs.
I just asked them, hey, I want to do this kind of stuff.
Is there something in Clang?
And it was like, yeah, yeah, which is nice.
And also, don't reinvent the wheel.
If you have Clang already at your hand, there is so
much stuff in there that you can just use, right?
Don't have to do this.
And also, maybe you just don't need this and
can get away with the configuration.
Okay, so there's a demo project that is online right now.
Can take a look at that if you are into Nix.
You can just Nix build it.
It should just work out of the box.
And yeah, to conclude, the compilation database consists of
compile commands, simple, that help your tools and your ID.
For unknown files, we have to use heuristics.
And when you want to customize this behavior of the heuristics,
you have to have some sort of this plug-in.
The easiest integration that you can find with ClangD,
because there you just need to link in your own plug-in,
like I just showed you.
Okay, and last but not least, there are some resources,
very good discussions about why you might do this and
why it's not so easy as it seems when you first start.
Okay, thank you.
I think we have two minutes for question, Adeni.
Yeah?
So how often does the default heuristic fail?
So how often do you need to put it in your own one?
I think it's, so the question was, how often does that fail and
how often do you need your own one?
So for me, it depends highly on the code base.
I have my simple hobby projects where it just works 100% of the time.
I'm happy.
But for example, at work, I have experiences where this is maybe
in 10% of the cases where it works.
And I create a new file and it immediately breaks.
And so something like that would help.
How does this interact with C++ modules?
There with C++ modules, the question was how to stay in
the action with modules.
It gets way more involved.
So for example, the scanning tool has a different mode for modules.
I haven't, to be honest, I haven't looked into that because we're not using modules yet.
But yeah, there's much more work to be done, I think, yes.
Yeah?
Just a little curiosity, what should the answer be if for example,
header files included into different files which have incompatible flags?
Yeah, the question was what should be the way to do,
if some, what should be the way if two header files have incompatible flags?
One header file with incompatible flags.
I don't know.
No one, this is the core question, the core problem here.
No one really knows.
And maybe you have the information for that, right?
You have maybe some clever way of doing that.
But generally, the people of ClangD, they cannot answer that for you.
We should ask LLMs to tell us for that.
Yeah, maybe not ask the LLMs for that.
Okay, I think we are running out of time.
Okay, thank you very much.
Thank you.
Thank you so much.