Okay, great. Good.
All right, so welcome everybody. Thanks everyone for joining the hottest room at POSTA.
My name is Steven Chin. I'm VP of Developer Relations at JFrog,
and I'm going to talk a lot about different projects which help secure the open-source supply chain
about why we need security, a bunch of different security incidents,
both historical ones, but also new ones which you probably haven't heard about,
a lot of new research and things going on.
And hopefully we can all help to improve the open-source supply chain together.
So I think a great analogy. Can you guys in the back hear me?
Okay, good. So I think a great analogy for the software supply chain
and how we think about it is to compare it to our food supply chain.
And we know that the way that you get great cooking,
great ingredients is starting by fresh ingredients.
Having things which you know are safe, which come through the food supply chain,
which aren't, don't have any people who are interfering with in the middle,
who are not following good hygiene practices.
And when you have an issue with your supply chain, you end up with spoiled ingredients
and you know, kitchen disasters.
So anyone here seeing the Gordon Ramsay kitchen disasters show?
Okay, a lot of good fun. And these are not the free-range chicken you're looking for.
We're hoping we can get better quality and better security out of our software supply chain
so we can build enterprise applications which are hopefully very difficult for attackers to exploit.
And this is how the USDA looks at kind of you know, the software supply chain,
creating a healthy supply chain. But it's somewhat analogous to software.
So you have a lot of production. You have you know, farms and things which are producing software.
You have distribution and processing. So it goes through a bunch of different tall gates
and different people in the process. Eventually it ends up in a restaurant and a retail location
and then you have you know, home users or restaurants or other folks who are cooking the food.
So if at any point in this process, if you have issues with your quality,
if you have you know, infections, if you have bacteria entering into it,
then that results in potential issues at the consumer side.
So I think when we're looking at the software supply chain, we need to look at it through a different lens.
And I think a good lens to look at it through is Salsa, which is one of the open SSF standards.
And it really focuses on getting attestations of the different parts of the builds that your software has gone through.
Kind of figuring out at each of these different gates, you know, is the source control secure?
Have you done the right things with code reviews? Have you done through the right processes with builds?
And when you have all of this information about the build, then you can figure out are you actually secure?
And a key ingredient to how you know this is the case, and this is why we're all here in the software build materials room,
is because you need to have that final index of your ingredients,
where it can show you from end to end, and Salsa and both SPDX, Cycline DX,
and S-bomb standards go really well together, because this way you have the attestation of what's happened in your build
into your artifacts, and then you can put that together into a single document,
which kind of shows you all of the things which verify the components,
and then the potential issues with them, which you might have.
And if you're not following these good practices in how you build software,
how you get provenance of your software and how you attest to it,
then you end up with issues like, for example, the log for shell incident.
Now I think this by now is infamous.
It sparked a whole second round of government security concerns over open source software,
and really the challenge for big organizations, which we're trying to address the log for shell incident
in production, was are my production systems affected?
And it depended upon the version of log for J, which you were using.
It depended upon whether you're using just log for J core,
or whether you're using the full set of libraries.
And the answer for most organizations was, well, I don't know if I'm affected in production,
so I'm just going to patch everything.
And that's very expensive, it's very difficult to do,
and when you have libraries like this, which are used so much across the entire ecosystem,
it's quite challenging as well.
And I think what really started a lot of the government concern around software supply chain
was an earlier incident.
This sparked the Biden administration's litigation around this,
which was the SolarWinds incident.
A very different sort of incident because this one was a true software supply chain attack
in the sense that they specifically attacked the build system.
So they were using TeamCity, they got in right before the certification happened,
the signing of the artifacts happened.
So to the downstream people, which SolarWinds was providing,
it looked like it was signed by the company and certified, and it wasn't a malicious,
but in fact they had done a very good job of infecting it before that was properly signed.
And so we'd like to prevent these sort of attacks from happening
because it causes a lot of damage.
It can cause potentially malicious entities to get access to information.
It can cause privacy issues with consumers, and it costs a lot of money and cost
according to IBM's data breach report in 2023,
USD over 4.45 million and a 15% increase over three years.
So this is a huge issue and it continues to get bigger for us as a software industry.
Okay, so let's talk about some additional incidents.
So anyone, which one of these is your package?
So when we're talking about like delivering libraries and dependencies,
one of the things that majority of software uses is it relies on open source components,
it relies on leveraging that because we don't want to write the same code
and it's actually more secure if we're leveraging open source libraries
that have been peer reviewed, that have been patched,
that are staying up to the latest standards.
But what if you can compromise the systems in the middle,
which are supplying this information?
So the dependency confusion attack basically relies upon the fact that
a lot of companies, organizations, and open source projects
use some sort of package management or middleman.
They'll set up repositories which will pull from upstream
or pull from local corporate repos.
If you can get the information about what the internal names of the corporate repos,
this is an example of Yelp, then what you can do is you can upload those
to NPM or other public repositories and especially you're spoofing these libraries.
So as a developer, as a CI CD system, you're going through a potentially vulnerable
package manager rather than getting awesome corporate lib 1.2,
which is the latest version of your company's library.
It goes and it sees, aha, there's a new version in a public repository.
I'm going to serve that up instead.
And as you know, bad things happen when kittens get access to nukes.
So we don't want this to happen in our supply chain.
Fortunately, all of the commercial package managers, including my company's
Artifactory, are now patched for this.
So by default, they will not go out to a public repository
if it exists in a local repository.
So this blocks that attack upstream.
But Alex Bresson, who did this exploit, was very creative.
He took an attack which was theoretical at the time.
Nobody had actually exploited it.
He attacked Google, Facebook, Apple, a whole bunch of companies
and simultaneously claimed about a dozen bug bounties and ended up getting
$130,000 USD for his effort.
I'm sure you'll see that.
Maybe instead of helping secure the supply chain, there's a more lucrative path.
But I think it's also like researchers like him also, they're helping to
expose the potential issues in the supply chain in a way where they're not
introducing threats, right?
So this is white hat hacking.
And we need people like this to find the exploits.
And also, this helps guide us for what we need to do for new standards of
SPDX, for implementing things like VEX to make it easier for us to figure out
what the vulnerability scope is.
So I think that these sort of attackers actually are helping us a lot with
the ecosystem.
Now, another food example here.
So if you have a recipe that calls for different types of rice, like for
example, if you're doing a risotto, you wouldn't want to use like a mixed
grain rice.
Like you need a specific type of rice.
And this is something else which attackers make a lot of use in the
supply chain.
So another common type of attack is called typosquadding.
Another variant of this is leaving off namespaces.
So as an example, our research team found an attacker which released to NPM a
whole bunch of libraries from Azure.
And they just left off the Azure prefix.
So if you're a lazy developer and just typed in the package you wanted, if
you left off the namespace, you would instead get a vulnerable library
instead of the actual library you wanted.
So a very clever attack.
And the way they did this inside of NPM is they actually had a random account
generator which would also generate a unique account for each of the different
libraries they uploaded.
So it also wasn't easy to systematically find, oh, well, this is a bad entity.
I'm going to block them.
So they managed to spread out the attack.
They did it on 280 different packages on Azure, Azure tests, Azure tools,
CattleLang.
And then they could install any software they wanted on the person's computer.
But basically it was set up for potentially exploiting data from personal
machines.
Later on, so our security research team found this.
We reported it to NPM.
They took all the packages down.
And then we publicly disclosed it.
Later on, a security research firm claimed that they were just testing out NPM.
So this was like a company testing the waters.
So it wasn't actually a malicious payload in any of the packages yet.
But it had a lot of potential for doing that.
And the security research firm wasn't exactly upfront about what they were
testing either.
So, okay.
And then, of course, if you're building, if you're serving food, you want the
ingredients to be very fresh, right?
You can't make gourmet food if you start with a pile of rotten food and things
which aren't fresh.
And I think when we're looking at the software supply chain, actually a good
analogy for this is the somewhat infamous picture of a stack of more things and
more things and more things with very small, fragile components nested inside of
the supply chain, which any of those, if you pulled out the banana, suddenly your
whole supply chain falls apart.
And I think a great classical example of this is the left pad incident.
So basically, there was a package published on NPM under the Keek package for
doing left pad.
Not a lot of code, so it's not something that's hard to write.
But as developers, we are very, very lazy.
If there's, if you can possibly save a line of code by including a dependency,
of course you would do that.
And then this Keek package was later claimed by a company, which wanted to own
that domain.
NPM sided with the company.
Cameron got upset about this and then pulled down, oh, actually the publisher
of Keek got upset about this and pulled all his entities down.
Later on, Cameron published an identical version of left pad to solve this
problem.
But this is the source code which caused this huge incident.
And this is something that is not worth including a library dependency, a
potential vulnerability for such a very trivial piece of code.
So this is, again, a huge threat.
Now, one of the ways you can find out what all your dependencies are and figure
this out in a visual way is using Guac.
So this is a new OpenSSF project.
It just got added to the OpenSSF suite.
What it does is it gives you a visualization of all of your dependencies,
lets you see exactly what you're using, how you're importing,
and has some nice visualization on top of it.
And I think using things like this helps you to figure out what your risk is
and what the potential scope is of your application and how vulnerable you are as a project.
So everyone knows Coca-Cola and it's very secret, right?
So the secret recipe is locked in a vault, very secure, nobody actually knows
what is exactly in Coca-Cola, that's their trade secret.
I think we pretty much all know what's in it now.
But there's this aura of mystery about the recipe and the history behind it.
And so how do we as software developers or as projects, open source projects,
keep our secrets?
And the reality is we do a very bad job of it.
So this is all of the exposed secrets in different central repositories,
which we found by scanning NPM, PyPy, RubyGems, crates.io, Docker Hub.
Obviously Docker Hub being the biggest repository and having large containers,
which came in a lot of other software.
There was just a humongous number of secrets exposed, 5.78 million.
But even the software repositories like NPM had 1.16 million, PyPy had 0.43 million.
So there's a lot of accidental exposure of secrets in open source repositories.
This is yet another attack vector which attackers get into open source projects
and allows them to attack the CI CD infrastructure, cloud accounts,
which the projects are using.
And even there's often accidental leaks of corporate secrets inside of open source repositories.
Because as a developer you're working in the daytime on your corporate projects.
And then evenings and weekends you're working on open source projects.
And there's a certain amount of crossover in that as well.
So the top ways you can help to prevent this from happening in your own project.
So first is not using automation to check for secrets exposures.
So using something like Truffle Hog, some sort of commercial scanner like X-Ray,
allows you to scan your packages before you check it in to make sure you don't have exposed secrets.
This is how we found that we basically ran our tooling on top of central repositories to see exposed secrets.
Second one is generating tokens with broad permissions that never expire.
So you always want to have the tokens scoped as small as possible in terms of what they can do.
And then setting expirations in a reasonably short time frame so you're rotating keys at the right times.
Third one is no access moderation for the secret.
So putting it inside of some of service like HashiCorp Vault or Docker Secrets or something will help to protect your secrets and tokens.
Fourth is fixing a leak by unpublishing the token.
So this is a really, really common mistake.
But you can't simply check in a new revision which deletes the token.
Because then, you know, Git has long history, it's going to remember it.
Now, if you followed point two and you have very short-lived tokens or very small scope,
that limits the damage because by the time somebody finds it, it's likely not useful anymore.
But again, a big mistake, you actually have to go and rotate the token to fully mitigate the issue.
And of course, you know, exposing unnecessary assets publicly.
So we saw a lot of cases where in test libraries and other like code which was not the main library code,
there were secrets exposed that were visible to infrastructure.
And in some cases, it looked like that the test code or the other like side cards beside the main code base were not even meant to be published.
They were kind of, you know, more internal code.
Okay, so to safely use open source, we also need standards.
I think if we've ever, you know, gone to a restaurant, this is really common.
This is in New York City, they have like letter grading on restaurants.
They have like, you know, reviewing of the source.
And I think a great way of doing this for open source software is the new OpenSSF Score Cards project.
So basically what this does is this gives you nice tooling for Git and a command line.
It'll analyze your project.
It will give you a score.
It's kind of like up to you to interpret the score for the different things that it analyzes.
But it tells you about code vulnerabilities, maintenance, continuous testing, build risk assessment,
source risk assessment, so a wide set of different things on your project.
And helps you figure out like how much risk is in your project, but also more importantly how much risk is in upstream projects.
Because if you have dependencies on projects which are vulnerable, then your project itself is vulnerable.
Okay, and I think, you know, given we're in 2024 and clearly the machines have been taking over.
So it wouldn't be complete if we didn't talk about what's happening with security of machines, machine models,
and some of the code which we're leveraging to make better use of AI infrastructure.
And unfortunately it's not looking that good for us so far.
So ML models, so the machine learning models which we all use and publish to public repositories like Huggingface,
they are highly vulnerable and this is, we're already seeing a bunch of attacks against these public repositories
with malicious actors injecting payloads into it.
And it's not very hard to do so the H5 format, the Huggingface format actually gives you the ability to put inside of it
information that is basically executable code that sits alongside your model.
So the developers have figured this out and basically from the moment you install the model, they can run some code on your system.
So as a developer there's always the possibility, there's already the possibility of simply using models inside of Huggingface
and other public repositories could expose your development environment to risks.
And basically this is an example of the base 64 payload and you can run whatever you want to inside of the model.
Another attack for injecting malicious packages is exploiting the generative AI.
So if you're using technologies like chatGPT and other generative AI technologies, what they'll often do is they'll suggest packages that you should use as part of your code.
And AI algorithms are prone to hallucinations.
Hucinations are actually quite predictable and a lot of the standard code queries which people ask for will include perfectly valid dependencies,
but they'll also include fake dependencies which don't exist, packages which don't exist in NPM, PyPy, etc.
So hackers have already figured out that by uploading the packages and putting malicious packages in the place of the libraries which the generative
AI is producing you can effectively cause people using chatGPT to execute malicious code.
So another potential exploit and now even the AI is introducing vulnerabilities into your code.
So here are some examples of perfectly reasonable queries, for example requesting, generating an endpoint that returns file contents, right?
So this code is vulnerable. If you now do a couple dot dot back dot dot slashes you're going to end up in other directories,
you're going to get access to files you shouldn't. And now if we again ask chatGPT, like, okay, we'll give us a secure endpoint that returns a file for user input and prevents directory reversal.
It gives us a more complicated example, but this is still exposed to URL exploits.
So as developers we can't really trust the current generation of algorithms for code suggestions to give us secure code.
And the attackers know this and this now makes a very easy class of security vulnerabilities which are likely to get injected into open source projects and other work simply by the fact that it's being recommended.
And something we're going to be publishing soon. So this is kind of, you guys are getting the before official publication on this.
So basically what we did is we went into hugging face, Kaggle and some of the public repositories, ran our detection on malicious packages to figure out like what the current exposure of developers is in the ecosystem.
And we found over 60 models which contain malicious behavior. We analyzed the payloads. Some of them were not truly malicious, but some of them were malicious.
And basically it allowed the attackers to run code on local environments. I believe we're scheduled in another week or so on the JFrog research blog to publish the results of this, but we're, of course, doing the right disclosures to the, to hugging face and Kaggle so they can take down the models before people actually extract the data.
And we exploited it. And I think building awareness of these sort of attacks helps the entire open source security ecosystem because we're the ones both in, you know, in this room building software build material standards but also in the general open source security space you have to figure out solutions so these sort of attacks don't become the next solar winds.
Okay, so you can find a little bit more about the stuff I've been talking about for research with the JFrog research team at our research blog. This isn't our like commercial blog, just the research guys publish here. So it's all the fun stuff.
And hopefully together we can create a more secure software supply chain. So thank you very much for having me at the software build materials room today.
Okay, if you guys don't mind, I want to do a quick selfie with the audience. So what's a good, what's a good security sign? Log for J, log for J. Okay, let's give a thumbs up for log for J. Cool.
Alright, thanks everybody for joining. And I think we have five minutes for questions if folks want to ask questions or if you need a breather because this room is very hot. I feel free to leave the room as well.
Any work on combining S-bombs with stored secrets and verification, things like that?
I think that's a good question. So I don't know if there's any work going on now about getting secrets as part of software like S-bombs, but maybe that's a good addition for the standards. Yeah, thank you.
Yeah, so the question is what kind of vulnerability is X-ray handles. So I would say we're clearly in the application security department, APSEC. So we find malicious dependencies. We find like secrets detection, like I mentioned.
We do stuff. We actually can build SBDX Cyclone DX files with both regular vulnerability info and also the new VEC standard. We don't currently do anything with runtime security, although that's coming.
Our package manager, Artifactory is open source. X-ray is proprietary. Yeah.
Okay, so Kay asked if I've looked at any of the stuff that's happening in AI for SBDX.
AI and data.
AI and data. And so I know about the working group that's collaborating on this stuff, but I haven't looked at any of the new stuff. Yeah. But I'm very interested to see what you're doing.
Okay, we'll do. Okay. Thanks everybody.