Right, I'm live. I'm green. Right, welcome to the post-lunchtime slot. Okay, some of you know me because I was here last year and basically I want to say what we've done for 12 months with S-bombs and it came out as an idea is about change. So I'm going to take explain about change. Some of it is about my tooling that's changed but a lot of it is about observations. A bit about me. I'm from Manchester. That's where that B is. I get asked about where's that picture. So that is a security hub for innovators for startups in Manchester trying to grow the ecosystem in the north. There's more than just London about tech, please. Normally most weekends I'm running around muddy fields at this time of the year so I've had a weekend off running muddy fields. My background is mission critical systems. For 40 years I was delivering mission critical systems. Think about big complex systems. So what Nicole was saying was my bread and butter. Those are my, those are what I used to worry about. Now I'm going to start up and I'm known as Mr S-bomb in Manchester. Didn't know what S-bombs were 12 months ago. They do now. It's all about a tool called CVEbin tool. So this has been presented a number of times and it's a binary scanner. It came from Intel and they wanted to understand what binaries were included in their deliverables and were they vulnerable. Common question. It's open source but one of the things we've done is become a Google Summer of Code project and each year we've added more features and I've been pushing the S-bomb world in there. So we added S-bombs, then we've added CVS, like Cizekirv. We've added EPSS this year. We do a very trivial triage. Let's say we might improve that with VEX, the world of VEX and it's got a thousand stars this week. It's very good. Open SSF best practice is interesting. I'm not going to work for Intel. It's a challenge sometimes in terms of do you have multiple maintainers and it's a challenge in open source when it's run by a commercial organization. So generally you see the reeds, calendar dates tend to trigger on GSOC. We found a little problem this week which is why we didn't release it this week but anyway it's very close to having a new version with all the EPSS stuff formally released. And then there's a tool that I write and I haven't got a thousand stars yet which is a Python generator and I take the installed Python and work out all the dependencies and work out all the bills. So think about Python. There's lots of direct dependency. What are those transitive dependencies? I'm agnostic to which version of S-bomb it is. I've always wanted to be Cyclone and SPDX. Initially I have written my own parsers and generators. I do want to migrate to the stable versions. It's all about time. But what I'm pleased about is there's a benchmark. We'll see then and it's the first one to get a benchmark score of 10 out of 10 which is but they'll tell you explain why you get 10 out of 10. It's quite hard because the ecosystem needs to play together. And this is what I do. Generally I just enrich it when I get the time and I've got a bit busy the last six months. So that's why I've stopped a bit. But anyway. So generally the sort of things we've been doing is adding more stuff into the package information using SPDX. So trying to get as many of the package attributes in to the S-bomb because you want enrichment. The more data you have the more usefulness it can be for the more use cases. That's hard because that data is not readily available. So what have we done? And this came out as a conversation, you know, a monthly open source meeting that we have and said it'll be nice to work out how much change do we have. And this is the S-bomb going to tell us what those changes are. So we put a get up action that runs about two o'clock in the morning. We clean the virtual environment, clean virtual environment, the bun two. And that's quite an important thing because that's going to come back later. We then install all the dependencies. And then we generate the S-bomb in the different forms. So whichever version is the latest flavor will generate. Do it. And 3.12 will become, I think, maybe this week, tomorrow. But generally we just ruin it for the supported versions of Python. So a little bit of a digression about Python dependencies. And it's probably if you went in Node or Java and things like that, you'll have, everything's got little quirks. The thing about Python is it tells you what the direct dependencies are. It can tell you a bit about the environment. So if you're working in Windows, you may have different dependencies than if you're working in another environment. But it says nothing about the transitive dependencies. How much is hidden? So let's look at the example. So this is quite, this is a subset of our requirements file. So at the top, you've got AIR at HTTP. It's got a constraint in the version saying the minimum version we require is 3.74. But it's also got optional requirements as well. So straight away you've got two potential two ways of installing AIR HTTP with or without that additional component. You look at beautiful soup, any version will do. And then you look at these down here, the import lib only installs it if Python version is less than 3.10 and it's got a constraint and similarly the import lib resources again only if it's 3.9. Because the Python library changes a bit like the early system, the language ecosystem is part of your partnership. And you can see the number of dependencies gradually change over time as you add more features. But what you get, that's what you really have, that's the hidden, that's the iceberg. It looks quite like an iceberg actually. That's one of my tools. And the green are all the transitive dependencies. Look how deep that is. That was fascinating. Pictures has a thousand words, I think we all agree. That's not really, if you really zoomed in, I've had to put the license values as well. That's an interesting thing if you could do some analysis. But actually that's quite visually, that's quite, that's quite a Iotner. And we've only got 60 packages there. So what have I observed by looking at all the data we've collected? And I want to look at the context, the context, a bit about quality, a bit about velocity which was the original thing, what's about change and then other things that are analyzed that I've discovered. And generally this is all out of GitHub, so I wrote a little utility to download the file history. So I could then quickly analyze it locally. And I ended up writing a little tool called S-bomb trend which then created it into a JSON file so I could then play around with it to generate pretty pictures which you're going to see. So first thing, there's nothing in any of the S-bombs that tells you it's Python. Or which version of Python, or which environment of Python. Now maybe Python, maybe FBH theme might, but that's actually quite important because you're going to see in a minute the difference what that means. Because if you just get an S-bomb and you don't understand its context, how do you know where, what, whether this is a real representation of the environment you're using, pick up what we were saying on the previous one. Cyclone DX has usually defined properties which you could use. SPDX doesn't yet. You could do comments but it's a bit harder, yes. Yes, I'm sure you could, yeah. So I use Cyclone DX properties just to say language, language Python, language version, something. But I think that's quite an interesting thing. It's good as SPDX thea's doing that because I think we need that is quite an important thing. And this is what you get. If you plot all the different versions of S-bombs across the year, the higher versions of the older versions of Python, it stops at p7 in the middle of the year because we stopped supporting it, but you see a trend. So that's the requirements trend and you see it sort of follows it and then there's a few other bumps. We didn't change it, the outside world changed. And sometimes you see it drops and that's because a package ceases using a dependency. It wasn't obvious until I did the digging up, but that's what that was telling me. It's quite interesting. So the lower versions have least a bit of a letter of dependency. You can probably sort of see that with the requirements file, but the requirements file is lost in the S-bombs. It's not there. So there are differences. Transitive dependencies vary independently of your direct dependencies. I think you could probably see that, but actually it's quite interesting to see the evidence. And the later versions of Python have the least dependencies. So that's a good way of saying don't just update your packages, update your language versions as well. So let's look at quality of S-bombs and that could probably have a whole conversation about this and a cold conference about it. So I've just chosen four tools because they demonstrate four different things. So the SBDX1 which is does it conform to the NTIA minimum standard. Look at the scorecard which comes from eBay, not the open SNF, look at something called QS which is from Interlink and look at one from me because it had something else that I discovered on Friday which was really interesting. So first of all NTIA. We are no different from day one to day today. We're still the same because we still fail to get all the suppliers. I would like to see how many people can get that on a real project. You can get that from small projects but not for real life projects. I think we all recognise that. Then the eBay one, one of the things they were doing is they were looking at package IDs, goes back to 10 o'clock call about the pearls and stuff like that. I didn't have pearls at the start of the year. I don't know. So my score went up. Enrichment, messages enrichment. Good and licenses have probably got better as well. SPM QS. This is done by Interlink. I don't know where they came from the idea but they have a whole load of different things they're looking for like licenses. Do you have other licenses still supported or the deprecated licenses? Do you have checksums for your packages, etc? That was a target. How can I get a better score as a target I started? So we get to 9.6. If you go on their website, most of the excluding S-POMP Python are in the sevens and eights. A lot of the containers are sevens and eights. So I'm quite pleased I can get to that level. The reason it's not 10 is because of the supplier failings, same as the NTIA changes. And then I have a tool called an audit. The reason I put that generated this was could you use the S-POMP to drive policy? So if you wanted to say I've generated an S-POMP and I've got a license like a GPL and I don't want GPL in the things, can I have a allow list or deny list of licenses, for example? That was the use case I came up with. But I also do it and I use the latest version of the products was the other thing I wanted to try and check. So I was getting reasonable number and the number of checks increased because I had more packages. Well this is the interesting thing I found. Scan came from last weekend. I scanned it on Friday. I was expecting to get 100% all the files were latest versions. Four of them got updated last Tuesday, which is why the green ones so happy. But there were a couple that hadn't changed. That got me thinking. Why don't packages change? Pinning. The world of Python is probably not to pin. They're indirect dependencies. I've no control of those. And I haven't quite got to the bottom of finding out where the pinning is happening because they're not even on the direct, the first level of the direct level where they are. So that was the reason that was there because I did a scan, an S-POMP scan, and what I got a vulnerability on my RSA. And the reason was I'm using, not using the latest version of RSA. So that was a weird, that was the sort of, could you detect that? So that's something that I only just discovered this week, which I thought was really interesting to share. I mean just if I happen to have that tool. So NTIA is a good benchmark. It's hard. Accurate supplier information. I think we all know the challenges of that. But date of enrichment is good. Can you enrich your desk things? Look for that threshold. Look at that utopia moment where you get 10 out of 10 for your S-POMPs. Because the more information you have, the more useful that's going to be for all the different use cases people are going to use your S-POMPs for. And it is possible. So this was the original use case. What's changing? What we're changing? What's not changing? Who's changing what? So the first thing is, and these are all driven by Matplotlib. So they're in the trend tool. So if you want to play with these as examples you can do. The top is the number of packages. The red line is the number of changes on a week by week basis. Every week one package, at least one package changed. At least. Which is good, the ecosystem's live. Is. Yes. Yes. So but that, you know, it's not, you know, what are the triggers for those changes? Yeah, some of the, you know, you can see some of the spikes relate to when we, when we did an update of the requirements. So that's, you know, you can see that. But generally things are changing all the time. And I was trying to show how to change, what's the, what's the rate of change and things like that. So this, I came up with this, like, train, train, flat diagram. That is showing a steady, steady going like that means it's changing every week. Except for the holiday. What? Except for July and all that. Oh yes. Yeah. Well, I think we can understand why. Yes. Actually, that's, that's probably quite a thing. Look at time. Time's actually a driver as well. Does lots of things happen in Christmas? Does lots of things happen in holiday periods? Yes. Interesting. That's enough. More people work on Christmas. Yeah. Well, I think we've seen problems where people have released something on Christmas day. And it's, but yeah. Anyway, that's a really good observation. I haven't thought that. Well, it's good. So you can see these things. And these are just the ones that have changed more than five times in a year. Because, you know, that's what 20 odd packages, more than 20 odd packages. And then if I look at, well, okay, these are the ones that frequently changes. Quite a few of them are direct dependencies. Why are they changing? Most of them are feature features, not vulnerabilities. Yeah. But actually, you know, can you find them? And there's one, a lot that rich. Why did they change? And they actually removed and unmaintained package, which then got me on another little track, which you're going to see in a minute. So yeah. Security fixes aren't the drivers for many of these changes for features. And then if I looked at the direct dependencies, again, okay, they're going up. Some of them are changing a little bit slower. That's, the case is no longer used. So you've got, again, you're getting quite a rich picture of change, which then says, if I pinned, first of January or second of January, I've missed all these changes. A lot of changes. Which may be, the features may be performance improvements, et cetera. You know, you might want them for good reasons. And this is what's the ones that have only changed once, haven't changed essentially. And the red ones are the ones that haven't changed in, I just took two years as an arbitrary value. And you think, well, okay, there's 10 of them that have not changed in two years. Does that not start linking a belt? It says, is it maybe not unmaintained? Is it now an unmaintained package? Don't know what industries have in terms of looking at the health of an open source project. Are they looking for the, you know, is two years long enough? And it says, maybe we need to look at alternatives. Right at the top there is Tom Lee, which is now a standard library within Python 3.11. Till I did this, I hadn't, I've missed that. So I then raised a pull request to say, if it's 3.11, we want to use the standard, standard library, not the open source version. So again, on the probability that the language ecosystem libraries are going to be probably better maintained or have a greater need for being maintained than necessarily community. Right? So change happens. But we could be very careful of pinning because direct dependencies change frequently as well. So there's a pinning debate. Right. Let's look at data analysis. Let's look at the first thing is languages, licenses rather. I've tried to look at the SBDX license IDs. When I get the metadata, try and map it. And if it doesn't quite match, do I have some sort of a few rules to try and alias them? So is it Apache space 2? Well, it's Apache 2 type of thing. And Apache 2 is a really good example. People don't know how to write Apache 2 SBDX ID license IDs. Yes? Are you pulling this from, from PyPy? Some of these come from PyPy. Yes. Yes. PyPy is a disaster in terms of specific license. Right. You've peached into the converted here. As a community, we should be looking at this and fixing it because many of the packages that have got license failures have been updated in the last 12 months. Probably because they've got features, but metadata doesn't really matter, does it? Metadata matters now in the world of S-bombs. Let's look after S-bombs metadata as much as the code and the tests. Told you I told you. Yeah. Yeah. Right. So I, so I summarise all the licenses and the things like that. And you see, again, you can probably quite quickly get a summary. Have you got a license problem? Okay. CVU and TIL is a GPL. Everything underneath that is okay, but you may be able to see quite quickly to see have you got a license compliance problem. The other thing is you can look at all the suppliers. Do you have a supplier that you really need to be loving and looking after? Because you're very dependent on your packages. This case, we've got 60 different providers, so it's not quite the obvious. But this could be a way of understanding who are your dependent suppliers that you need to be maybe getting closer to. Maybe supporting, maybe helping. And I'm thinking about the world of the enterprises as well, who might be needed, look, needed to do this. So again, but four of these packages have no suppliers. Three of them were updated. Why didn't they update the metadata? And then just a summary, I've got TIL that just differences two S-bombs, arbitrary format, don't matter that you can compare Cyclo and DX and SV8X. Just to see generally what's changed in the 12 months, well, there's 39 of the packages. I've had at least one version change. And we've lost two packages and we gained 11. So that's what 15% growth packages, number of packages that are dependent on. And then I did a scan. And that's the last one is I was expecting the last S-bomb to be as clean of vulnerabilities. The reason I've got one vulnerability is because of the RSA problem that we've heard too earlier. Potentially. So takeaways, I'm doing all right here. Right. Generate your S-bomb for each version of the supported environment you're doing. So if you're doing Python, 38 to 312, generate five S-bombs and also do it for Cyclo and DX and SVDX because the generations may have more, they may be different data, they may be enrichment between the two. Please as a community can we improve the metadata? We have all responsible to do that. Once you've got an S-bomb, that's the start of the fun. Start analyzing it, start using it, start reading it. It doesn't matter whether, you know, I'm sure many of you are quite familiar with reading JSON. Help the people around familiar with JSON. Look at the data, there's some documentation tools there you may find useful. This is the thing that we do when we install. We install with Python with this upgrade strategy, which is trying to make sure we're using the latest version of everything. But obviously that doesn't stop pinning. So it's interesting. I need to think a little bit more about that with Python teams. Keep your package up to date. I have a problem in my things because I just do pip install and they'll say, oh, I've got a beautiful soup. Yeah, that'll do. It's not the latest version, I'm sure. So just that's just be aware and use the latest version of Python. I have another tool called S-pom for files, which looks at the files so you can look at the change of files as well. That's a bigger thing. So it's just a thing. Could you start to see the amount of change in maybe one of your source trees and you repost, you know, or the test files changing, for example. And then obviously add vulnerability scanning as part of our generation. So this is what you all probably want. These are the list of other tools. The presentation is we'll be on the CVU in tool. There's a pool request in there. It just needs to be approved. Those are all the tools. I haven't written all of them. But if you want to follow me, that's me on LinkedIn. That's me on GitHub. And that's me in Manchester. Okay. Thank you very much. So on your list of increasing hidden dependencies, is it both a package or a package version? Okay, this is about the picture. Yeah. Yeah. Okay. So the picture that I showed, this is that showed the hierarchy of all the packages. They are all packages. They are Python packages. So if you have two different versions of the same package, they appear as one or? No, they would be pairs two if you had that. I've never seen that. Oh, okay. Yeah. Well, I would say that you live on an isolated island in Kingdom of South Africa. We are in the Union. And we have a presentation coming up. And I've got the divorce when you say that updates are driven by features. What do you see? Will that change in the future? Okay. The question there is, I mentioned the thing I said a lot of the updates appear to be driven by feature changes rather than security features. The question is, do you think that will change with things like the CLA? Probably. It depends. It depends. Yeah. There's no other one more. Okay. This is about the improving the metadata upstream. Probably the two things I would say is licenses to support the license compliance teams. And secondly, the supplier, because does that identify, do you know where you've got your software from? What can a large organization know that could sue the way it is? What can we do to help do that upstream? Use SKDX tags in your Python modules. Yeah. Yeah. You can do a public request. Yeah. I don't know. I mean, yeah. I think, recognize that there is a community out there. If you've got the effort, do it. Use it. Because, you know, we know the open source community is stretched because of volunteers. If the enterprise is taking value of it, can we use fully use your contribution? Because you're going to help many people. All right. Let's take time for me again.