Good morning everybody. So I've got too many slides to present, so I'm going to go kind
of fast. I think all of you know what SPDX is. Does anybody not know what SPDX is? I'm
going to skip up, Axe Troublemaker. Now I'm going to skip through kind of the what is
SPDX slide and jump into what are we doing about 3.0. This is really a talk about my
kind of more practical journey. I'm one of the maintainers of the tools and I recently
went through a process to upgrade the tools for 3.0 and I thought if I shared my experience
with you, those of you who are writing tools yourself might gain some of my experience
from that and help you out with your tooling. So as far as the agenda, I thought I might
start with a little bit of context. Why did we even do 3.0? Because as you'll see there
are some breaking changes or some changes you'll have to adapt to in your code. So it's
good to know why we're all doing this. I'm going to talk a little bit about the approach
to creating the spec because that will give you context to some of the techniques I've
used for upgrading the tools and an overview of the changes, the important part. Then I'll
talk about my practical experience on the Java libraries itself. So why SPDX 3.0? We've
gotten a lot of feedback from the community. By the way, you guys hear me okay?
We have no idea. Yeah, we don't know if it's... There we go. Let's see if I can... It is there.
I think that'll work. Alright, a little bit easier. So we've got a lot of feedback on
the SPDX 2 spec that is just too complicated. There are too many pages in the spec. So we
took some steps in 3.0 to simplify it. At the same time, we added a lot more use cases,
which actually can make it a little bit more complicated. So we've taken a few approaches.
I think the biggest one is the introduction of profiles in SPDX to allow you to focus
in on the things that you care about on SPDX and that does create some changes into the
specs and impacts the tooling. Another big change that impacts the tooling is we made
a lot more flexible. We have some people using SPDX and extremely large deployments, very,
very large S-bombs, and they want to be able to basically distribute this S-bomb across
many different files, across the network. And so you'll see some structural changes
that allow you to do that more easily and of course reliably and in a way that you can
authenticate the relationships. There's a lot of interest in SPDX and non-licensing
and non-security scenarios now. So product safety is coming up in 3.1 and some of that
actually started to kind of come into 3.0 as well. But there is a lot of changes to support
that as well. And a big change, there was actually a time when there was yet a third standard.
I know many of you are frustrated with two standards. There was actually three for a
while and we merged two of them together in SPDX so that also had somewhat of an impact.
So I'm not going to go through all this, but just to point out, we've been around for a long time.
So don't ask me why we created two standards. We started back in 2010 and we've gone through a
lot of evolutions. Most of them adding use cases back in the early 2000s, 2010, 2013. We added
security use cases. More recently, we did the merger. And you can see on the top some of the
external influences, the NTIA being one of the most significant in terms of accelerators.
What's that? And the CRA. Absolutely. I am in Europe. Yes. And we've also done some work on
ISO standardization that's on the timeline as well. And of course, this has an impact on how we
evolved the SPAC itself. We started off in a very simple PDF. So we'd give tool developers like
myself, here's a PDF, go implement it. That kind of created some errors. Some of us read the words
a little bit differently. English isn't the most precise way of describing some of the technical
features. We moved that over into a markdown file that was a little bit easier and we generated
things. And then we went to an ISO SPAC. Have any of you guys ever gone through an ISO specification
process? It's interesting. There are a lot of requirements. They're very picky about their
format. So we went through that. And then where we ended up is going to more of a model based
description of the language and generating actually multiple different schema files. For 3.0,
we actually spent quite a bit of time deciding how we want to do the SPAC infrastructure for 3.0.
We decided that a lot of us wanted to write directly to schemas. There's a lot of people
though that wanted to make it just human readable and human writable more importantly. So we actually
took kind of a middle ground. We described everything in a markdown files, but in a very
specific format that every time you commit to the repository, it checks to make sure you adhere to
that format. And then we take that and we generate an intermediate schema file. And that schema file
then generates everything else. So I have a little bit of a diagram to kind of show you what the
process is. And this is important if you want to contribute to the SPAC. This kind of gives you a
guide on how to do that. We started with a conceptual model. This is kind of temporary. We
don't use that anymore, but it's just a picture to get us all on the same page. And then we write
the specs and markdown. And this is where you can contribute. You can just commit directly to the
SPAC in that specific format. And thanks to Alexios and quite a few other contributors, we have
tools and generators that right now is generating a website, an HTML version of the pages. And
here's where us tools developers get to actually kind of get a little excited, at least I do. We
generate a Shackle Owl schema file. Now, how many of you have never heard of Shackle or Owl? Okay.
Oh my gosh. You guys are going to kill me. But there's good news. We translate the Shackle and
Owl to something that you do understand. So, you know, just hang in there because we will get you,
we're going to be generating certainly JSON schema files, you know, that I think is really popular.
But you might be wondering, first you're wondering what the heck is Shackle and Owl. Look it up.
It's really interesting. It's very complicated, but it's very complete. Okay. It's very complete. And
then this is where we go to, we call it serialization schemas because JSON looks different than XML,
looks different from, you know, there may be other schemas that we generate as well. And the way we,
the reason we did all this is it ensures consistency. If we agree on what the markdown file is,
everything is completely consistent all the way through to the schemas you use to validate your
source code. So, it's well worth the effort. Really, Kate, it's worth the effort. So, you might,
now after you ask yourself what is Shackle Owl, you might ask yourself, especially if you look at
the spec, you're going to wonder why did we pick that. One thing it captures not only the syntax
of the data, which all the schemas do well, you know, this is an integer, this is a string,
it's got this pattern. It also captures the semantic behind it. So, it goes beyond what you can
capture in a simple syntactic schema. And that is the additional information we pull out of the
markdown files and we put it into the Shackle file. So, we can say things like, oh, you got a
relationship between a file and a license. It can only be of this type and you have to have at least
one of those. Whereas in a syntax, all you can really say is there's a relationship and it's got
this cardinality and it's got this type. So, you can go beyond, you know, the specifications. And,
of course, if you start with that, you can easily generate the simpler schemas, but you can't go
from the simpler schemas to the more complex. So, that's why we picked Shackle. Now, the other reason
we picked it is just about the reason we picked all the, well, there's a lot of, huh, it's coming,
it's coming back. There's tooling for, there's libraries that support Shackle and most
eco-language ecosystems like Python and Java, etc. Am I back yet?
So, it's not coming back because we don't have slides that being captured on the stream and the
HDMI is put through here. So, there's something going on with this machine. You need to go and talk
downstairs. Stream is not available, is what they're saying. Oh, dear. Technical difficulties.
So, let's, oh, but I keep, is it coming back? You want me to disconnect? Okay. Okay.
X2.x, you know, if you cared about security, you cared about licensing, you cared about, whatever you
cared about, you had to read the whole spec to find the little field that you're interested in,
you know, in supporting. It's kind of hard to navigate that. And if you wanted to conform,
you know, what is required, you know, it's like, you know, if you got a, if you're interested in
security, you really want to make sure you have the integrity fields. If you're interested in
licensing, you want to make sure you have the licensing fields. But, you know, what do you make
required for the spec? So, we introduced profiles and we have what, six or seven profiles in total.
And there's really three different aspects to a profile. The most important is the conformance
requirements. And what that means for us tools developers, that's the most important. What
that means is if you are a producer of a spec and you say, I conform to this profile, I'm, I'm
meeting the minimum requirements. That's your promise to the consumer. So, you can say, I conform
to licensing, security, and AI and data. But I don't conform to the new services profile. And
that, that, and that's, you know, of course carried over in the, in the, in the data itself. It's
also a namespace. And this is where the simplification comes in, is that you can kind of filter the
spec on what you care about by using these namespace. Technically as well. So, there is a
technical namespace that goes along with all the classes and properties. And you can filter on
that. And within my code, I also use that to help me with some of the verification code that, that's
there. And it's also the way we organize within SPDX. We have meetings that are organized by
profiles. So, people of like-mind and like concerns get together and actually develop the
spec. So, let's talk a little bit about some of the other structural changes. In SPDX2, we,
everything was around a document and that was a file basically. And we had a mechanism for
reliably linking documents together because you may get many types of S-bombs for many vendors. You
may want to bring them together. You may want to compare them. And you may want to link them
together. So, we had a mechanism to do that. In 3.0, we still have the ability, this I got to
make this really clear because there's this rumor that SPDX documents are dead and 3.0 is not true.
They're still there. And you can use them the same way that you've always used them. But you
can also link directly from the elements. And an element is a package or a file or you know,
something, you know, a unit of something you care about in SPDX. So, now you can go directly. And
so, you can put these things out on a network without having to worry about the files that
contain them. And so, think about like the World Wide Web, you know, where you have like files and
images that are linked together in HTML. You can do that in SPDX documents in the future. So,
it's a very, very flexible, powerful mechanism we're introducing. Relationships have changed. In
SPDX 2, they were a property of the element. So, you have an element like a package and you say,
it has a relationship to another element like a file and that would be a property. There's a
problem with that when you go to this distributed environment because you have to have, you have
to know about this in advance. You can't introduce a relationship after the fact because it's a
property in the element itself. So, we moved the relationship outside. So, now you have a separate
object which is the element that does a relationship from one element to the other. And we've put a
bunch of properties in there that in a way kind of simplifies the relationships. So, rather than
having hundreds of relationships, we can have dozens of relationships and a few properties within
the relationships to take care of it. How am I doing on time, by the way?
You are at, yeah, 17 minutes.
Oh, perfect. Okay. Other, I want to make sure I go through these changes because I think this may
be the most interesting part to you guys. The other, there's a few other changes. There's a
better model for what we call entities. This is the person organization. In SPDX 2.x, they were
just strings and you'd have to parse the string to figure out whether it's a person or an organization.
We now have kind of like a whole object hierarchy that describes what these things are. It makes it
a little bit easier for parsing. We renamed and removed a lot of confusing properties. Those of
you who have built tooling for SPDX 2 will love this because people complained about these properties
all the time. And we either renamed them to make them clear or just got rid of them. And, you know,
for example, files analyzed. A lot of people don't like files analyzed. Functionality is still there,
but it's just a lot clearer how to actually do those use cases. We've added some additional
uses, useful classes and properties. So, for example, we elevated package URL from an external
identifier to be in a property on package because a lot of people are using that directly for
identifying the package metadata. And then we have some additional profile specific classes and
properties, of course. And on this, I know you're not going to be able to type this in. Hopefully,
you'll get a copy of these slides. There is a Google Doc that I put together. It's a living
document, which means it's open for comment from any of you. You find something missing. Please
comment on it. But this is kind of a guide to all the detailed changes. And I was writing that as
I was doing this, and I know there's some folks that have done the same thing and contributed to
this document that describes all the migration. It'll turn into a migration guide once we do the
full release, but right now it's more of a living document. So, kind of stepping back at these changes,
you know, what's kind of the big picture of this? You know, it'll be a lot more flexible with the
profiles. There'll be a new relationship structure in addition to relationships. So, you need to
annotations independent as well, so you can do more incremental changes to the S-bomb without
having to go back and create a whole new big S-bomb. And then simpler profiles, simpler snippets,
more use cases. And then, again, see the migration document for that. So, now I'm going to switch
over to my personal experience. I was involved in writing this back, and now I'm going to tell you
how fun it was to actually implement it. So, the Java libraries, first to give you context,
I wanted to give you an overview of what the current SPDX 2.x library's architecture looks like.
You know, it's what you'd expect. There's a model set of classes, and that match exactly the SPDX
2.x model. The only change is really is I had to rename some of the things that conflicted with
the Java language. So, Java doesn't like you to call a class package, for example. And then,
there's a set of utility classes that has some useful functions like being able to do a comparison
of license, little things like that. And because in the very first iteration of this, I started this
like 10 years ago as a pretty printer, it was very monolithic, and I got a lot of feedback that,
hey, you know, I don't want to have all these RDF library things in there, if all I want to do
is generate JSON. So, we introduced a storage interface that lets you create a lot of different
model stores. The model stores can represent a very specific serialization of a file, or it can
also represent like a database or a triple store if you're into RDF, or the most common is just an
N memory store for it. So, this allows you to separate these out into separate jar files. It
does add a complexity because there's a storage interface in between that has to adhere so that
we can separate these out into different things, but I think it does make it a lot cleaner.
So, a couple of breaking changes that I noticed right off. I think one of the ones I did not
expect, this change to the namespaces actually caused a change to the storage interface because I
was just using the property names, and I knew that I could always map the property to the full
URI of what the properties were, or the full string with the namespace because we had a clean
mapping. I can't count on that anymore. So, I had to add one extra parameter, which means, oh my
goodness, now I got to change all these different libraries to use that. Of course, I put in a
compatibility library that made it a little bit easier, but that was a breaking change to all
those things that are implementing the storage object, at the storage model below. The model
itself created some breaking changes as you'd expect after going through what the changes are.
There is, what I did is I took all of the spdx2.x code and moved that over to a compatibility
library so that it's all still there. It is though in a different package in Java, so there is a
small change to the imports, but it should work pretty much as is. The relationship and annotation
structures that definitely impacted the Java code because it moves it out of properties and makes
them a little bit more independent. I came up with a trick to help manage the consumers of my
libraries, keep them from having breaking changes. I'll come to that in a couple minutes. That might
be an interesting tip for some of you. The external document ref structure really changed. That was
probably one of the more significant changes. We talked about the agents, the snippet simplifications,
and then moving these properties to relationships.
Sure.
That layer will direct you to the compatible layer or to the new
model layer. It basically minimizes the impact to the users of my library. That's this spdx
model factory is what does the switching there. This is the little trick that I came up with
for relationships. We used to have these as properties and now we moved them over to separate
independent relationships. You can imagine what this will do to all the users of the library. It's
like, oh, this isn't just like a change of coding or a change of names. I got to restructure my code.
I came up with a way to make it look like a property inside the class. I have a special
class that says, okay, this is a relationship, but it looks like a property. If you're interested
in that technique, let me know. I can show you the code. It really wasn't that hard. It's a very
generalized class that I can use for just about any kind of property. I think it's called
relationship property or something like that. That makes it a little bit easier.
The other thing I was focused on is reducing the errors. You remember in the, how am I doing?
I'll see you in five minutes. Thank you. I saw you getting ready. You remember in the specs,
we did a lot of things to reduce the translation errors down to the actual schema files. We're
taking that further with the coding as well. We're from the OWL Shackle file. I'm generating the
Java code. The Java library code, so now you got from the markdown file all the way through to the
actual Java code, traceable, reproducible code to make sure it's all done right. I can't tell you
how many errors I have personally made or I mistyped something or I didn't read it right,
and it got implemented wrong in the Java library. I think the errors that I make now will be much
bigger because it'll be in the code generator. It'll happen to everything. Sorry. That's a
little bit of a joke, but no. It'll get rid of all those little errors. We'll also be generating,
as I mentioned before, the schema files for those of you who would rather see things in JSON schema
or XML schema. I also generate the verification code from the Shackle OWL files. If you're into the RDF,
it complies with the Shackle OWL. Those are some of the techniques for reducing the errors.
I think this is my last slide, the new architecture. One thing I didn't mention is this copy manager
in between. It's a little bit of a detail, but it's kind of an important one, is that if you're
migrating, if you've got two different SPDX documents with two different versions and you're
referencing to each other, that copy manager will let you copy it over to the new version. That
kind of does the upgrades. It'll also copy between the different types of model stores. It'll let
you convert between tag value and JSON, whatever. Anyway, I think I might have a minute or two
for questions. I got three minutes for questions. Did I go so fast? I lose all of you on that?
That felt like I was speed presenting. Yes.
I recognize that you get a lot of work about the specs so that you can see them easier. That's great work.
But also, I would say we need to have a lot of different kind of examples. You know, you write your library,
then you want to test it, and then you find out whether you really understood the specs the right way.
Yes. Yes. And therefore, more examples of different types.
Yes. I don't think I can repeat all of that, but I think the basic comment, and I think it's a really good one,
is in addition to the spec, we need to have examples, you know, so that we can work off of those examples.
And we do have an examples repo in SPDX today. We plan on, we're going to organize that in the future for 3.0 by profile.
So you'll be, if you're interested in security, you can go down and look at, like, the security examples, be able to use those.
Excellent point. Thank you. Yes.
Now is current code ready to convert a file from SPDX2 to 3.0?
Yeah, that's a really good question. So the question was, do we have code today that'll let you convert from 2 to 3?
The Java code is not ready to be used yet, unfortunately. It compiles, but not quite ready. Yes, Dolph.
Yeah, we have a project that can do that with the 3DUSRC of SPDX3.
So Dolph mentioned there is, is that in the presentation later today?
Yeah. Okay. So we hear about a tool that can do that later today. It's not the Java library. So it's coming though. It's not quite ready yet. Yes.
SPDX3 light coming?
SPDX3 light is coming. And that will be, that's one of our profiles. It's got, it's unique in that it skinnies it down, you know, rather than adding things to it, which other ones do. Yeah. Sorry. Yes.
Talk about some of the relationship of SPDX to RDF. I was wondering if you'd come up against any requirements, things that RDF doesn't support, stuff that you feel like you need to push back up into RDF.
Ah, that's a good question. The question is, is there anything that we ran into in the RDF world that we, that we couldn't satisfy by using like, like the RDF, maybe Shackle Owl specification?
I'd have to think about, I have a feeling we have, but I can't think of an example right now. You know, I, I, I, I, let me think about it and then give back to you later. Yeah. Thank you. Yes.
What's the view about converting SPDX3 to Cyclone DX and back and forth?
Oh, yes.
Because they've obviously got things like AI in their model 5. Right.
5, etc. You've got one. Right.
And security. So we're looking at compatibility because there's not people to be, what people to be flexible?
Yes, we do. And I'm, I'm with you on that. So the question is, what about converting between Cyclone DX and SPDX? We actually had an effort going on in SPDX2 where we had people from Cyclone DX, myself included on the SPDX side collaborating and, and we were doing two things.
We were, we were writing libraries to convert, you know, so kind of really testing it hands on. And, and then we were also working on the SPAC where we were like in 2.3.
I actually put a number of things in per request of the Cyclone DX team to make it easier to convert. So we're doing both of those. Unfortunately, that collaboration stopped.
I am looking for somebody from the Cyclone DX team to work with to do that in 3.0. So if you're on the Cyclone DX team or if any of you in the room are in Cyclone DX and are interested in, in collaborating with SPDX and make it easier for all of our users, let me know.
I'd be happy to work with you. Thank you.
Yeah.
Okay. So I, I'm not sure I completely understand the, oh, time's up.
Answer him.
Yeah.
Why don't you go ahead and shut me down and you can go ahead and close the screen and take that over. Yeah.
Yeah.
Just.
So, sorry. So the, the decision about.
These are committee that decide about the changes.
Oh, how, okay. Like the governance of how the SPAC has made itself. Yeah. So we do have a formal governance process. We have a technical, we have kind of like a steering committee and then we have different work groups.
The real work gets done in the profile work group. Most of it's in the core. There are team leads that are nominated and, you know, the steering committee, this whole process that does that.
And then the way that we really try hard to make all the decisions consensus space and it's, it's based on contributions too.
So if somebody says, hey, I want to do this, but they don't contribute anything. Yeah, we don't really listen.
If they say, hey, I want to do this and here's a poll request. Here's the spec. Here's some tasks. You know, here's what you do to the schema to make it.
Then it's like, oh yeah, come on in, you know, we'll work on it.
And sometimes there's differences of opinion. We try to work together very rarely.
The team leads will have to make a call and we try to do it based on the majority, but you know, it's rare when we do that.
We think very carefully before we do that. Yeah. All right. Max, thank you.