All right, so hopefully there's a bit of a voice left after last night's excesses.
And I think it's the same for everyone.
So it's not my regular voice.
Usually it's a bit higher.
So thank you for joining me today.
So I want to present a new tool called DejaCode, which is a tool that can help manage S-BOM.
And we'll go through that.
Agenda, quick word about me and the project at large.
We'll show you some of the features of the tool and then I'll go to a demo.
So about me, I like to think I'm on a mission to make it easier to reuse free and open source
software.
And that means removing any obstacles that are in the way with its license, security
issues, and eventually in the future quality, sustainability are related.
I lead this project called AboutCode, which has many subparts, including well-known scan
code and also package URLs.
I don't know how many of you know about this or have ever used it in any way, shape, or
form, show of hand.
Yeah!
Great.
So I'm speaking and preaching to the choir.
And this is my technical advisor.
It's a family rescue puppy.
She's six years old.
And he decided that he better spend his time on my keyboard, which is not easy.
So sometimes if we're chatting and you see really like weird stuff going on, that must
be.
So AboutCode, it's open source tools, but also open data.
And I think that makes a big difference.
We're trying to focus on providing as much as possible low level primary tools to do
scanning, origin detection, and provide volunteer data.
I'm one of the original co-founders of SPDX.
I also contribute to Scylindiax and I hope sometime we'll be able to bring the two together.
I'm also co-founder of ClearDefine.
And I try to contribute to many other tools.
We're supported and blessed to be supported by quite a few of your companies.
So that's awesome.
And we have a company behind that just provides services to help sustain the work.
Most of our work is about in fact sustaining the development of the tools.
So the problem, just to frame it the way I like to think of it, is we now can develop
software using components that we can assemble.
It's going to be even worse or even better after the use of Generative AI.
And it's very easy to forget where the codes come from.
And we need to know where it's from, what's the license, security, and in the future other things.
There's a big problem in software promotion and this is that there's been a huge amount
of investment from VCs we're talking about in the range of about 1.5 billion fucking dollars.
Right?
All of this is to eventually milk and mine free and open source software.
So you pretend to make it safer and easier to use for large corporates.
I have nothing against that but I think we need as a community to come together to bring
better tools and what I'm trying to do.
So the about code stack, three components, three parts.
One which is a bunch of low level tools that can find where the code comes from.
That's the SCA tools.
Management app, deja code that we're going to look at today.
And three components in the knowledge base which is database of license which is built
on top of SPDX for anything that doesn't go into SPDX.
Which is the last I looked, the largest database of license, this side and this square ground
of the galaxy.
So it's not too shabby at least, open database.
Database of package metadata, files and fingerprints and database of vulnerabilities.
And just a word on the vulnerability side.
I'm surprised because we're trying to get the data upstream from the source.
And it's very clear that nobody did it in many case before.
We want to ask the beyond people what's the license of your vulnerability data?
Nobody ever asked.
We want to ask the NGINX folks.
What is the meaning of your advisory vulnerability range?
It took us two months to understand.
Back and forth with the NGINX metanus.
Nobody ever had asked them.
Last week we were in discussion with the folks from G-LiP-C.
They became a CNA recently so they're allowed to assign TVs.
They publish advisories.
Obviously never anyone had asked them how to parse this stuff because we're the first
and we're doing a back and forth so we're helping them also for the community at large,
making this a bit easier to do.
So I won't, I'll put the slide online and I won't go into all the details.
Something of interest on the SA tools beyond scan code is a new tool for doing code matching.
Code matching is you have a big index and you're able to find based on fingerprints,
whole package files and eventually approximate files down to the elusive snippet.
But we're doing it in a very different way that's been done by existing tools so that's worth looking at.
And of course package URL.
On the data, just a quick look at where we stand.
There's a big problem we have right now is how we can share this data efficiently.
So we, we, we have received a bit of funding from the EU for that.
And we're building a system where we can federate the data to massively share it without keeping the control of it.
That again, not our intent to keep the control of this.
Eventually you could think of it as something like a federated open food fax for code.
So there's a code.
The point is you can import, spdx, cyclone dx.
You can generate cyclone dx and spdx.
You can aggregate that in a product.
So you can combine all of these, but you can also enrich it.
So you can say, hey, you know, I received this sbomb.
Maybe it was generated by gripe or sift and it's missing license data, which is a common occurrence.
And I press a button and voila, I get the stuff scanned.
Let me show you an example.
So that's deja code.
We don't have a lot of time, so I'm cutting a bit for the chase.
You know, I have a product here, an example product.
I have a bunch of files there, package actually.
And if I want these to be scanned, I just say, you know, scan all the packages.
And it will eventually fill in the blanks for license.
It's also doing the same for volumeties.
And if we look at the inventory of packages, well,
this one doesn't have too much information in terms of volumeties.
Let me look at another example, which may be a bit more interesting.
Oh, yes, log for G.
No, that's there.
Yeah, that's a good one.
That's true.
Well, it's another one here.
So this is an example here.
It shows up.
It looked up volumeties and a bunch of log for back volumeties.
The integration we're doing is with the open data we have with vulnerable code.
So if you drill down the package, you have details about the volumeties.
And eventually you can zoom in on the details.
The interesting thing we do with the volumeties data, as I said,
we aggregate data from any source.
That includes all the GitHub, GitLab, NVIDIA, Google data, and
many upstream source.
And we're trying to compare and contrast them so
we can find what is actually the correct data.
It's a cluster fuck of mess.
You have database that make up packages.
Not a big deal, nobody uses them.
But in terms of trust in the data, it's you have incorrect version ranges,
more often than not, different database.
Don't agree on which package is vulnerable and which one is fixing the stuff.
So you're like on the receiving end, if you're in a security team,
you have your eyes to cry and you're going to spend your life doing triage
of low quality data.
So we're not fully there yet, but we're trying to.
So for instance, you see here you have a vulnerability identified by
Perl for an instance of the package in Debian, but also at Mevan.
So that gives you a bit of an idea of what we can do there.
So let me go back briefly to the side.
So you can import a speedy x, import cyclone DX.
One thing we don't do yet is vex.
Being able to provide effective statement of if the usage of certain package,
which are vulnerable in my products or applications are vulnerable effectively.
But that's on the roadmap.
In particular, one thing that's great is that there's a format called CSEF.
And it's essentially based on Perl.
I've discussed a lot with the folks that promote this.
So there's a lot of low impedance between what we have there in terms of data
model and what exists in CSEF for instance.
Open vex is also based on Perl and so is the cyclone DX format for vex.
Where are we next?
So yeah, so that's a product.
The data model is composed of products, which is typically what you think as a
product or application.
You can have many of them.
You can have versions.
We like to track the jacquard itself there.
And you can see we change license.
We're not always open source.
And it's much more comfortable to have every piece of code and open source and
data open source, except one which is my passwords and keys.
But everything else that works.
You can do comparison between versions.
You can see what's been added, removed, changed.
So it's a rich data model.
You have components which is a way to combine together things which will be
typically what you think as a component is.
Maybe one or more package that you reuse as a block of construction in your
product construction set.
You have licenses, owners, a bunch of tools for reporting, LDAP integration.
Everything you would expect from a decent enterprise between quote grade app
that you want to use for this kind of purpose.
And that's it.
Questions?
Go ahead.
How do you see yourself in contrast to OVAS dependency tracking?
So to my understanding, you lack racks, but you're stronger on the license side.
So the question is, I repeat for the audience,
how do I see deja code in contrast with dependency track?
So they're good buddies.
We have a slightly different way.
First, the UI is white as opposed to the default UI which is black on dependency
track.
I hate dark mode personally.
And each time there's someone that comes with a contribution to put dark mode,
we eventually find a way to redirect it to more positive energies.
But so that's one thing.
That's a big, big difference.
Technically, so the dependency track is essentially doing similar things.
I guess the big difference is whether you're based on product and
packages versus your focus is more vulnerabilities.
Here you could think of it as a way to manage your package and
component inventories across multiple products.
You can do custom reporting, this kind of thing.
But yes, there's probably a lot of similarities.
Another tool which we may have quite a few similarities would be software 360.
Now, at the lower level, which one does what exactly it's difficult to say?
I know the folks of dependency track are working.
They reached out to me to integrate the data we have on vulnerable code.
So eventually, maybe they all level up and that's great.
We're not in a competition mode.
We collaborate and we share as much as we can and let the best two win.
And there's probably people that prefer dark mode and people prefer light mode.
Other questions?
Thank you.
Thank you.
I'm sorry, but if you have multiple licenses, you have an ant here.
Yes.
It technically means more in some cases.
No, in the case of this, at the top level package, all the licenses apply because you have different shanks.
One part.
So you have also the case that is more?
Of course, of course.
Yeah, yeah.
Thank you.