Okay, so thanks for bearing with me.
I'm Sara Petti and I'm going to talk to you today about updating
open data standards based on the journey that brought us from
the Frictionless Specification version 1 to the version 2,
which is currently ongoing.
So briefly about myself,
I'm the International Network Lead
at Open Knowledge Foundation and also
the Frictionless Data Community Manager.
I love the digital commons and I'm based in Bologna, Italy.
I left here some ways you can contact me via email,
ex-former in Onos Twitter and GitHub as well.
So before we start,
I just wanted to give you a quick introduction for those of you
who might not know the Frictionless Data Project and
the Frictionless Standards about the Frictionless Data Package,
which is the core Frictionless specification.
The Frictionless Data Package is basically a standard to package your data.
It's very simple and very easy.
Basically, you package your data together with a descriptor,
so containing your metadata and a scheme about your data.
There I put a link if ever you want to explore
all the specification of Frictionless Data,
you can just go on that website.
But so the Frictionless Data Package was released in 2016 in its version 1.
So some years have passed meanwhile,
and it has actually gained a lot of
tractions in research communities, academia,
but also it has often been mentioned in
the Open Data Guidelines of
governments and public administrations,
and it's often used by data wranglers as well.
So we started to think about what
actually was the success of Frictionless,
why was this standard so successful,
and these are some of the things that we came up with.
So the first thing is,
so the Frictionless specifications were not developed alone in a room,
but they were really the outcome of modern 10 years of iteration with
community of practices, stakeholders,
and also a full engagement on issues around interoperability,
data analysis, and data publications.
As you've seen from my slide before,
the specifications are very simple.
I think that that's also one of the key of the success of it.
Basically, because they are very simple,
they disrupt as little as possible,
whatever existing infrastructure is already there.
When thinking about actually the Frictionless specifications,
we always had in mind as an example the CSP,
which is a standard for tabular data,
and we think that the key element of CSP and why it is so adopted,
it's because it is so simple that everybody can use it.
It's not maybe the most adaptable to specific use cases,
but it's still adaptable by almost all,
and so it's one of the most used actually tabular standard right now.
So the simplicity of the Frictionless standard also mean that they are
extensible and customizable by design.
So they are designed for tabular data,
but we have a lot of people in the community that use it for other data as well.
We have metadata standards that are much unusable,
because of course we want to have to bear in mind that data must be fair,
but we also keep in mind that there are human that might want to manipulate that data,
and so the metadata standards are also human editable.
Another thing also very important for us was not to reinvent the wheel,
so try to reuse as much as possible existing standards and existing formats for data.
And then last but not least,
we tried to build as much as possible something that was language,
technology and infrastructure agnostic.
Once that was done, we started thinking about the options of the standards,
and one thing that became clear quite quickly was that a standard alone was sometimes not enough,
and that you need also a technical implementation of those standards.
And it's funny because I was talking yesterday with someone from the Frictionless community
who was telling me exactly this, that it's so great that we have basically built libraries on top
that you can use to perform a number of things on your data,
for example validate your data or extract your data,
and those are present in a number of programming languages.
So I work at the Open Knowledge Foundation where the core Frictionless team sits,
and we developed for example a Python framework, which is the first link that you see there,
but then the community that uses Frictionless also developed other libraries in other programming languages
that perform some of the same functions as well,
so we have Frictionless R for example, Frictionless JavaScript,
and those all form what we call the Frictionless universe,
and here's a website that I'll definitely encourage you to go and have a look if you're interested.
So okay, it's all very nice, everybody adopted the standard and it gained traction,
why did you need to update then?
Well, of course since 2016 issues started to accumulate in the GitHub repository,
so basically last year with the core team at Frictionless,
we started having conversations with the community,
and we started to go through all these issues, try to triage them and see those that were more requested,
those where there was more conversation ongoing,
those that made more sense because of the internet requirements that came up during the years,
and so we decided to start a draft roadmap for version two,
and then the second part was okay, now that we decided to update those standards,
how do we coordinate this update,
and that was probably the part that took most part in as a community manager,
and here I tried to resume the key elements of this update
and the things that it was important to take into consideration for us for this coordination,
for the coordinating this update.
So the first thing is of course don't do it alone,
right from the beginning it was very clear to us that we had to take into account
and bring in people from as many backgrounds as possible,
because as I said before, the Frictionless data standards are very simple
and they are adaptable to many different use cases,
but if you want to build something so simple you need to also hear a lot of people,
a lot of, have in mind a lot of use cases,
because they can actually help you to build a common data model that will fit then the needs of everyone,
or at least it will help you find some minimal common ground.
And so when we started our Frictionless data specification working group,
we brought in people from research institutes and universities from different academic fields,
but also libraries, open data cooperatives for example, and engineers as well.
The other thing is be clear, so the first thing that basically the working group asked us was,
okay, very nice, you want to do this, but please let's define the overarching goals of this project,
let's have a roadmap of this project and let's have it somewhere that it's easy to find.
So for us it was quite easy, we have a project website which is frictionlessdata.io,
so there we published a website announcing the specs update,
detailing the goals, the deliverables,
and from there we also linked to the roadmap, which is actually on GitHub,
because that's where also the technical discussion with the community is happening in all the issues that you see there.
The third thing that was in the beginning a bit taken for granted,
but it actually needed some thinking as well, was to decide how to decide,
because okay, we sat down with the working group and everybody was like, yeah, okay, we'll do this with consensus.
But then we clearly realized that it needed some definition as well,
because not everyone was understanding what consensus really meant,
does everyone need to participate every time to the discussion,
even if maybe it's some part of the specs that's not really important to them.
And so we basically decided that PR can be merged in the specs only
if two-thirds of the working group has participated in the discussion
and has a favorable opinion about it, and that consensus,
we understand as consensus when we reached a kind of like solution that everybody can live with.
And the blog, it's in the announcement blog if you want to go and have a look.
So that's it.
So just to give you a view of where we are now,
at the moment, basically we had 36 issues that were part of our first roadmap.
So 10 out of 36 are now closed already.
Of the remaining 26 open issues, 11 already have a first PR proposal,
and then 23 over those 26 have already actually an ongoing working group discussion.
What we decided to add also as a kind of like information for the community,
for the working group as well, but also for the broader community,
is to have a public live track also on GitHub live as an issue,
and basically you can go there and we update it on a weekly basis
to basically have a place where people can monitor the progress.
And our aim by June 2024 is of course the release of the friction specs version 2,
but we would also like to release a small Python metadata mapper
and also some integrations in external systems like SICAN and Zenodo.
To conclude, I just wanted to mention that this update is made possible
thanks to the generous support of the NL Net Fund, NGI Zero and Trust.
There's a lot of fantastic opportunities out there that maybe could be useful for you as well.
They found a lot of open source projects, so I encourage you to go and have a look.
And then I wanted to thank you for listening to me today.
I left there a bunch of links that might be useful.
So the first one, the frictionestlitter.io that I mentioned a couple of times,
is the project website where everything is linked from.
So if you want to find, for example, all the GitHub repositories, you will find it there,
but also the different project pages.
We have a community chat on Slack, but if you prefer to use an open protocol,
you can access it by a matrix as well.
I left the website of the Open Knowledge Foundation
and also our Twitter handles of the Frictionless Data Project
and Open Knowledge Foundation.
Thanks.
APPLAUSE
We have time for one question.
Yes, thank you.
A very short question.
We said, we said, agree on a commentator model.
Someone who has repeatedly failed at that.
I'll be right back.
Thank you.
So the question was how to agree on a data model,
because that's very, very difficult to agree upon.
I think for us, the key,
it's of course something very difficult to do,
but it's basically to take away all the layers of complications
and all the sometimes specifics of some type of data.
Instead, what we did was basically collect all the kind of data
that we wanted to support and try to understand what the common things were
and basically start from there.
Of course, again, it is very simple, adaptable,
but it is also something that is focused on tabular data.
It is extensible to other kind of data as well,
but that's the kind of course you have to have a data type in mind.
I don't know if that answers your question.
Thanks again to Sada.
APPLAUSE
APPLAUSE