So I don't know if it's okay.
So hi everyone.
So as I was introduced, my name is Pierre.
I'm a PhD student in France.
And the subject of my thesis is to work on given subclass of
the number eighties on cryptographic code.
So be reassured, I won't talk about cryptography here.
But the static analyzers seem to us as a good place to
implement our analysis.
So first I'll go through maybe a bit of an overview of the
static analyzers.
I don't know if any of you have followed up on the analyzer.
Well, David, of course.
Okay, so then I'll go through, well, a bit of our journey
to our development of our analysis in the analyzer.
And then I'll present some remaining issues that I think
would be interesting to discuss within, well, with the
community, I mean, because I think that some of the issues
could be addressed directly within the analyzer.
So first, how the analyzer is working.
So in case you never used it, so it's a really dumb code
here.
So we just allocate a pointer and use it, well,
free it and then use it.
So this is the kind of code that the analyzer can detect
as a problem, so statically at compile time.
So when you call it with just dash F analyzer, this will give
you a really nice output on the standard output with all the
paths leading to the problem.
So a bit of an overview.
So it was introduced in just ten.
So 2020, there is an API available to develop an
out of plugin.
So that's really interesting because you do not need to,
well, rebuild all of GCC to, well, try the analyzer.
And there is a symbolic execution in Giant inside it.
So it's neither sound nor complete, as David said to me.
But it's something really nice because as implementing your
own analysis, you do not need to care about pass visibility
because the analyzer will do it for you.
And so how it works internally, so basically the analyzer
will handle a state machine for given variables for you.
So you do not really need to handle any data structure for
this.
The analyzer will do it for you.
You just have to care about the transition of your variables.
And there is, as you saw, a really nice reporting system.
And since not long ago, there is also the option to
output syrif file.
So it's really nice standard pushed by Microsoft, I think.
Yeah.
And on the committee.
Oh, okay.
That's it.
And well, thank you, David, for this.
And also answering my email because it was my first
experience with GCC.
So it was a bit painful to go through GCC and the analyzer
at the same time as a newbie.
So how does it work internally a bit more?
So usually variables are represented with both left value
and right values.
So for example, on the first line, you got an integer x with
the value 42.
So within the analyzer, there is a data structure known as
store, which will give, well, keep the difference value for
your variables at a given program point.
So after the analyzing, after I've analyzed the second line,
within the analyzer, you'll have this kind of state.
So you'll have the region, which is the left value for the
analyzer, x having the symbolic value.
So the right value 42 and pointer having the symbolic value
of the address of x.
And so to implement the state machine, it's pretty easy.
You just need to inherit from the state machine class.
So as I said, the static analyzer will handle a map of s
value to state for you.
Something nice for reporting is that a given symbolic value
can have its states having an origin.
So it could be another s value, allowing the analyzer to
rebuild the path to the problem you're triggering.
So you can also have really complicated logic behind your
states if you need to.
So there is a class state you can inherit from too.
And that's all you need to start to play with the analyzer.
So our journey to implement a secret analysis in the static
analyzer.
So we needed to implement a taint analysis first.
So taint analysis usually come from, well, historically
come from the user input validation.
So some of you might know it.
So basically behind it, there's four core IDs.
So there's the source.
So where does the taint come from?
The propagator.
So it's the taint propagation through different variables.
And the sync.
So this will be operations triggering issues.
And filter is how you can destroy the taint for a given
variable.
So if you apply it to my problem, it will become, well,
the source is the secret.
So for example, a cryptographic key, sorry, or a password,
or anything at all.
The propagator will be, well, how you propagate that secret
dependent notion.
The sync will be, in my case, well, condition.
So if you have a secret dependent variables used in a
condition or a memory access or non-constant CPU operations,
you'll have some vulnerabilities on your code leading
potentially to a side channel problem.
And so filter will be, in our case, for example,
the tainty ratio.
So a call to B0 on an allocated area of memory, for example.
So our submission is pretty simple.
Every variable is in the start state.
And in our case, if, so the original secret will be tainted
by attributes.
So for now, the developer has to give an attribute to the
variable.
And the variable will be tainted.
But you can also have other variables being tainted if it
is dependent by its initialization, sorry, by, well,
another tainted variable.
And the sync came here.
So as soon as there is tainted variable used in a sync
operation, so for example, the condition, we emit a warning
through the analyzer.
So our first try will just check if it was working correctly
with a pre-intuitive one.
So it's just a secret.
You take attributes and use it in a condition.
And we expected the warning to be emitted.
So it was working perfectly.
And now we wanted to check that the propagation was doing
well too, and that was working too.
But when we looked into the hood, we noticed that, well,
actually, it was not variables who were tracked.
It was their symbolic value.
So for example, here, you had secret with the symbolic value
of 42 and y with 142.
And in the state map, so this is the map, the data
structure keeping track of the state for your variables.
Well, it's the symbolic value of 42 and 142, which are painted.
So we came with a problem example, which is like this.
And there was a false warning emitted here,
because as it's 42, which is tracked and not x,
here y was implicated tracked as tainted.
So that's the minimal example representation of the data
structure of the analyzer.
So we needed to modify the state machine state map class
to be able to not only track for symbolic values, which
is really nice to do well pointer aliasing issues.
So we wanted to also be able to track for origin.
So that allows us to not implicitly track the value of y
here anymore.
So at the same time, we needed to modify the notion of origin
for a trait, well, for a state, because in that case,
we want to be able that when pointer is dereference to be
able to know that it points to a tracked data.
So here it's y.
The address of y is here.
So within the symbolic, well, the state machine state map,
you'll have something like this.
So this was allowed the analyzer to rebuild the pass.
So basically, a secret, having the origin,
is the kind of the original secret.
You'll have y depending of secret,
and the address of y is tainted, because y is tainted.
So our modifications.
We modified mainly the state machine state map logic, also
a bit of class URL handling out of three API user,
and also diagnostic related code.
So we believe it could be nice to have those changes merged
in the analyzer, because yeah, we are working on that,
because it will allow the analyzer to receive
a wider set of analysis, because for now,
it's really nice to do some pointer related analysis.
But if you want to get in touch with integer, float, or anything,
it can mess up your analysis.
So there are some remaining issues still,
because when you're in the frontier, for example, of,
well, scalar array and pointer, well, it can be a mess,
because you do not want to track, for example,
the third element of an array, so here, which is secret,
and alias by its pointer.
So the value of pointer is pointing to the third element,
and you use it to access it.
But you do not use the pointer to taint the region behind it.
So to do it, we did a bit of trick.
So at the same time, we are tainting the region t of 2,
because we cannot just taint the symbolic value of t of 2,
because otherwise we will taint 42.
We have to also taint t plus 2, so the symbolic value behind it,
so the address of the third element of t.
So within the data, you have something like this.
So what's interesting us is that value, well, not really that value,
but the region containing that value.
So we do not want to track the right value of that element.
We do want to track its left value, and also its symbolic value,
well, the symbolic value of its address.
And so for now, we are chatting about it,
because we do it in our analysis,
but maybe it could be done directly within the analyzer.
I do not have any solution yet, but I think it would be nice to discuss it.
Another problem is regarding inter-procedural analysis.
For example, here you have the local variable in the main function,
which is secret, and when you give it to f,
well, secret does not exist anymore.
So you cannot really have, well, for now,
or at least I didn't find the APIs within the analyzer to do it.
You cannot really look at the value, well, of the taint of secret
when you are in the context of the f function.
So that's because the analyzer have data structure
to represent a frame for functions,
and so there's a notion of local values.
So when you're within the context of the f function
and you ask for, well, the state of the secret variable,
well, for now, you just have a crash,
because it's not a local variable.
But we could discuss that.
Yeah.
So a bit of takeaways.
So the modification we did are,
well, now we can track state region for only inter-procedural analysis.
So there are still issues remaining.
So as I said, when you're in the frontier of region,
when you want to track a region,
so left values and one, well, right values,
so scalar arrays and pointer frontier,
and also inter-procedural analysis, as I said.
So, yeah, that's it for me.
Thanks for listening.
Feel free to reach out if you need to.
Sorry, you don't have the question.
Yeah, that's the question, I think, though.
Thank you.
We have some questions.
Yeah, so we have, like, questions and so forth.
Well, do not force the question.
If you do not have any questions, that's OK.
I have many questions.
I bet.
Sorry, I should.
We have 13 minutes.
OK.
So, you're the one.
Yep.
So, why don't try and implement the input thing
into the privacy ecosystem, for example,
instead of GCC?
OK, so the question is,
why not to implement the analysis
directly into a more formal tool,
such as privacy, for example?
The idea is that there's already a lot of tools
doing that particular issue,
the kind of analysis, but
a lot of them are kind of hard to use.
You cannot really set them up easily.
So, the idea behind it is to develop a tool
which has no ambition to be neither sound nor complex,
just usable, easily plug-in, plug-outs.
So,
the idea at the end would be for the developer
to not touch his code base at all.
So, that's work we need to do yet.
But, yeah, the idea is just to try to think with the user
and hopefully not making our tool a pain to use.
Yeah, so that's it.
Any more questions?
Yeah.
Is there a way to support, like, other languages,
like if the static analyzer,
because I saw only examples in C code.
If, for example, the static analyzer was building a way
that you could extend it to, for example, C++
or other methods.
Yeah, so the question is about
if there are any other language support.
So, as we are in the, well, Gimbalist representation,
basically, we are targeting C code only for now
because, well, the subset is way easier to manage.
And the idea would be to then rely on, well, the analyzer
because I don't think the C++ support is...
Yeah, I mean, as you say, the analyzer runs on, like,
Gimbalist, I say, in terms of intermediate representation.
And so, in theory, it handles everything that we have a
for, but in practice, it handles everything that I've
implemented.
And in particular, exception handling support isn't
implemented yet.
I'm interested in doing a separate code project.
Or, if you want to start a new project, that would be
a wonderful thing for someone to work on.
I'm focusing on C.
But, I mean, people have used it for, I think,
as a concept of running, checking unsafe rust code
and CPL.
And in theory, anything that GCC compiles, you can analyze.
It just might look rubbish.
They're probably rewards, yes.
So, that also was a question.
Hi.
Hi.
How far in the game can you analyze things?
For instance, can you analyze certain bits of, I mean,
like, not things, other bits?
I think it could be possible, but you have to modify
the analyzer itself.
So, from out of three, plugging as we are trying to do it,
I'm not sure you can do it, really.
But, there is data structure within the analyzer to do it.
Yeah.
Within the analyzer, essentially, we build a directed graph
of program point, program state pairs.
And the program state, there's a store thing,
which basically models the state of memory.
And that, in theory, tracks the Pupit level.
So, for example, within this frame, within this local,
within that frame, within bit 17 of that local,
is bound to this symbolic value.
How well it works.
Yeah.
And how well it works with the state machine as well.
I don't know.
Yeah.
So, for now, no.
Yeah.
Yeah.
You have a question?
Yeah.
I think, with the exact amount of pointers,
you're already showing that probably you're going to run into
a decidable problem to have perfect answers.
So, you always have to make a trade-off between false positive
and true positives.
Do you have any guiding principles on how to decide?
We want to, yeah.
Just repeat.
Yeah.
Okay.
So, the question is about, well, basically,
completed or soundness behind it.
So, we do not aim to be neither sound or complete.
But we want to go to soundness more.
So, it means all problems are true problems, basically.
Well, yeah.
Follow-up question.
Yeah.
There are a number of other static analyzers.
Yeah.
I wonder, like, are they all aiming for rough,
the same balance between those two?
Well, it depends of the different projects.
We were, well, we went for the GCC static analyzer because,
well, on the research part, there was no work on GCC.
And we thought it would be interesting to, well, first understand,
well, why?
So, is there a particular reason?
Is it because how the compiler is working?
Or is it because it's, well, I don't know,
because people are just used to go to LLVM, for example.
Yeah.
Yeah.
I mean, from the general analyzer point of view,
I guess I'm doing it from an extreme programming,
pragmatic kind of, I have a bank of open source,
pre-stuffware test projects.
I turn on the analyzer, I see what it spits out and decide,
does that suck or is it useful?
And it's even accordingly.
And it's not very formal.
Yeah.
Hopefully it's useful.
In some ways, think of it as a glorified compiler warning.
Like, it's like, I really want to spend a lot of compile time
to get a bit of a deeper, more involved warning than that.
It's not going to prove that your program is correct.
Yeah.
You still have time to analyze it.
Yeah.
Let's hope it's fine.
You can do that.
Yeah.
Well, that's good.
I think that's a good question.
You mentioned the limit when it comes to procedural analysis
with your static analyzer.
Especially in use case like yours,
where you try to track the access to secret,
applying memory locations to secret states,
how would you handle inter-translation unit-based
analyzers because I could probably access this location
from a completely different file if I wanted to?
Yeah, probably.
But the thing, oh, yeah, yeah, sorry.
So the question is if we could not handle the problem
of the interpersonal analysis problem within another pass,
if I, yeah.
So it could be done.
But then we would have to get out of the analyzer
or maybe add some other logic at some points.
But the thing which is nice with an analyzer
is that in our implementation,
we do not need to care about symbolic execution,
well, symbolic evaluation of variables.
And, well, everything is taken care of.
Yeah.
I specifically had it run at the point in GCSE
in the integrates with LTO.
Oh.
So you can actually do link time analysis.
Yeah.
Well, in theory, one thing is the link time analysis.
Yeah.
In practice, if you do it on either,
I have a few test cases in the test suite.
If you try doing it on anything non-trivial,
it will explode right now.
Yeah.
Sorry.
Yeah.
But in theory, you will find all the order n squared.
Yeah.
And just to clarify, I'm not a developer of the analyzer.
I'm just a user kind of the analyzer.
So I'm, well, I did modify it,
but so far it's just a local modification.
And hopefully it will get merged at some point.
Yeah.
That would be nice, but we have to handle
the inter-processor problem first.
Yeah.
Yeah.
Yeah.
Yeah.
If you enter two links, like LTO,
if you enter the integrates in LTO,
you will get a lot of top positions from the for it.
Doesn't that deal with some of it?
Yeah.
Maybe you should repeat the question.
Yeah.
The question is about, doesn't the analyzer
come a bit late in the passes,
because it could be run after a lot of optimization?
Well.
And the answer is yes.
Yeah.
The answer is yes.
In the analyzer.
Sorry.
Yeah.
I think that's a good question.
Yeah.
I mean, as I said, I chose to run that late
in order to try and piggyback LTO.
And unfortunately it means in theory,
some optimizations have run,
and very similar optimizations have assumed
there's no undefined behavior.
And, well, actually, what does it know about
undefined behavior?
So that's one area where they can be caused positives.
Potentially we could move earlier.
It would be a lot of work, because I don't know,
a technical bet.
Yeah.
And how many times are?
Yeah.
End times up.
One last question.
One last question.
Yeah.
What is it?
Yeah.
Yeah.
Sorry.
In practice, that just means that if I care,
then I want to analyze one time,
so zero and one time, I would go three right.
Yes.
I'm not sure to.
So I've had of running 10 analyzers.
I now just run GCC twice, once with zero, analyze,
and one with go three analyzers.
Yeah.
That's OK.
So the question is, instead of running several analyzer,
just have to run the static analyzer of GCC
and then run the different optimization enabled.
So is it the question?
It's more than that.
Yeah, I'm more than that.
OK, thanks.
I was, yeah.
All right.
Thank you again.
Ends up.
Thanks.
Thanks.