[00:00.000 --> 00:12.000]  I'm Daniel from the two-star platform in Europe, and I'm sitting over there, and I also work
[00:12.000 --> 00:19.000]  with other projects like Detective, which we want to be an open-source solution to make
[00:19.000 --> 00:25.000]  European public tendering data or public procurement data explorable for people who don't know
[00:25.000 --> 00:31.000]  that much about the procurement data. So I want to do a couple of things in this talk.
[00:31.000 --> 00:39.000]  First, I want to describe why public procurement data is interesting, why we should take a look at it,
[00:39.000 --> 00:48.000]  and I want to discuss some problems of how this data in EU context is currently accessible.
[00:48.000 --> 00:54.000]  And then I want to show you our project of alleviating some of these problems with Detective.
[00:54.000 --> 00:59.000]  And then I want to show you how you can actually contribute to the project with your company.
[00:59.000 --> 01:07.000]  Still very much in the early stages, just getting going, and we love the opportunity to show this now
[01:07.000 --> 01:16.000]  so we can actually contribute even in the earlier phase of the project.
[01:16.000 --> 01:24.000]  So what's TET? TET's in the name, and what's TET? So TET stands for Tenders European Daily,
[01:24.000 --> 01:32.000]  and it's basically a data set that's published by the EU Publications Office,
[01:32.000 --> 01:39.000]  and they've published this data for a long time. They've been publishing this for a long time since 2015,
[01:39.000 --> 01:49.000]  actually, they've been providing this freely on the internet, and it's data about basically who buys what from whom,
[01:49.000 --> 01:54.000]  like which public institutions in the EU buy what for what price from which organization.
[01:54.000 --> 02:00.000]  So it's really data about the relationship between business and government.
[02:00.000 --> 02:06.000]  And if, so for example your local school or some ministry in your country in the EU
[02:06.000 --> 02:11.000]  wants to buy something that's of a certain threshold, they're defined in the EU legislation,
[02:11.000 --> 02:15.000]  you can look them up in the link here, I will upload the slides upwards.
[02:15.000 --> 02:25.000]  It needs to go into TET, and it will be in this data set, and there's at least 670 billion per year in value
[02:25.000 --> 02:35.000]  that's kind of encapsulated in this data, and there's more than 700,000 notices that they publish each year.
[02:35.000 --> 02:43.000]  They've described this entire process of public procurement in the EU.
[02:43.000 --> 02:51.000]  It's very great that some of you want to join.
[02:51.000 --> 02:58.000]  So you put things, well great, you publish it, so what's the problem with that?
[02:58.000 --> 03:06.000]  I mean the way this data is made accessible is via this UI, one funny thing is,
[03:06.000 --> 03:11.000]  one funny thing is, this button for statistics mode, I still haven't found out what that does,
[03:11.000 --> 03:16.000]  like what that changes, maybe somebody from the EU can illuminate,
[03:16.000 --> 03:21.000]  but basically you have to really know what you're searching for in the first place
[03:21.000 --> 03:23.000]  in order to be able to use this kind of interface.
[03:23.000 --> 03:29.000]  And there's also a lot of other problems with accessing this data.
[03:29.000 --> 03:33.000]  For example, you can't really search by organization, which would be interesting.
[03:33.000 --> 03:38.000]  I mean it's about the relationship between government and business in all of the money terms.
[03:38.000 --> 03:43.000]  So why is there no option to search for organizations that I'm interested in?
[03:43.000 --> 03:49.000]  I can only really do a full text search over these huge XML files, which are really complex.
[03:49.000 --> 03:53.000]  And I can do some other stuff, but there's no type of tolerance, for example,
[03:53.000 --> 03:58.000]  none of the really nice search features that we can use to.
[03:58.000 --> 04:05.000]  And most importantly, there's no ability at all to readily visualize the results that I get.
[04:05.000 --> 04:12.000]  Like if I type something in here, in a search mask, I get back a list of HTML,
[04:12.000 --> 04:18.000]  basically just an HTML list of notices, then I need to understand what's a notice
[04:18.000 --> 04:20.000]  or the different types of notices that I'm interested in.
[04:20.000 --> 04:23.000]  So it's really hard.
[04:23.000 --> 04:28.000]  So it makes the test right, because accessibility is really bad with this data.
[04:28.000 --> 04:30.000]  So why is detective needed?
[04:30.000 --> 04:34.000]  In the past, there have been a number of attempts to look at this data
[04:34.000 --> 04:40.000]  and transform it into a more manageable or readily analyzable format.
[04:40.000 --> 04:46.000]  And we weren't really able to identify a single, freely available solution
[04:46.000 --> 04:51.000]  that was published under a free software license that allows you to explore this data
[04:51.000 --> 04:55.000]  even if you don't have domain expertise or data science.
[04:55.000 --> 05:01.000]  And you kind of need both now to be able to make some sense of this data.
[05:01.000 --> 05:03.000]  And we thought this would be interesting.
[05:03.000 --> 05:06.000]  So why isn't this more readily available?
[05:06.000 --> 05:12.000]  So we applied to last year's EU Datathon with this idea, basically, to make this data more accessible.
[05:12.000 --> 05:14.000]  And this is what we told them.
[05:14.000 --> 05:19.000]  So we have any type of, let's say we have a public servant
[05:19.000 --> 05:25.000]  that wants to find out who buys what from, like, within their state.
[05:25.000 --> 05:28.000]  Who buys from Microsoft, in Germany.
[05:28.000 --> 05:34.000]  And how much they spend on free software from this company.
[05:34.000 --> 05:40.000]  And yes, maybe make the case of how much they can save if they use free software instead.
[05:40.000 --> 05:48.000]  Or let's say you're a journalist who wants to investigate recent purchases made by Microsoft.
[05:48.000 --> 05:50.000]  Or authority.
[05:50.000 --> 05:55.000]  You could do that now with a patent to face, but it would be very, very difficult.
[05:55.000 --> 05:58.000]  And you'd have to jump a lot of hurdles to get there.
[05:58.000 --> 06:03.000]  So we want to take it as to be an application that you use
[06:03.000 --> 06:07.000]  which lowers the barrier of entry to analyze.
[06:07.000 --> 06:15.000]  So we thought let's present the publications of this concept with free software.
[06:15.000 --> 06:17.000]  And keeping it very simple.
[06:17.000 --> 06:21.000]  So we built something roughly with this architecture.
[06:21.000 --> 06:23.000]  So you have this XML file.
[06:23.000 --> 06:28.000]  And this was very quickly built just for this Datathon.
[06:28.000 --> 06:30.000]  So I'll go through it quickly.
[06:30.000 --> 06:32.000]  So we had this XML file.
[06:32.000 --> 06:36.000]  I transformed it to JSON for whatever reason, which was a very bad idea.
[06:36.000 --> 06:41.000]  And I parsed it in Python and put it in some ad hoc schema in Postgres.
[06:41.000 --> 06:46.000]  And then I used the Neo4j ETL tool to put it to a Neo4j database.
[06:46.000 --> 06:53.000]  The data I was interested in was relational data between, and it shows the relationship between business and government.
[06:53.000 --> 06:55.000]  And then I used Neo-dash to visualize that.
[06:55.000 --> 07:06.000]  And that actually already gave people at PUD some chance to see what might be possible with you if you open up this data.
[07:06.000 --> 07:12.000]  So I'll show you the little demo of how that looked.
[07:12.000 --> 07:14.000]  So basically this is just an overview.
[07:14.000 --> 07:17.000]  I parsed data for roughly three years or two and a half years.
[07:17.000 --> 07:21.000]  This shows you the activity per country.
[07:21.000 --> 07:26.000]  This is just some general overviews, like roughly a million tenders.
[07:26.000 --> 07:30.000]  And then it's not optimized yet.
[07:30.000 --> 07:35.000]  You basically search for Microsoft Germany and then you have this graph.
[07:35.000 --> 07:40.000]  You have a geographical distribution of commercial activity that's related to Microsoft.
[07:40.000 --> 07:50.000]  And you get this nice graph of relationships between Microsoft Germany here in the center as an entity.
[07:50.000 --> 07:53.000]  And then the yellow or red ones are tenders.
[07:53.000 --> 08:00.000]  So here they sold something to some institution of German government in this case here.
[08:00.000 --> 08:05.000]  Mostly because Microsoft Germany mostly sells to German government.
[08:05.000 --> 08:10.000]  And the red ones are tenders above one million euro.
[08:10.000 --> 08:21.000]  And that gave you a very quick overview of the commercial activity and the relationship between government entities and business entities.
[08:21.000 --> 08:27.000]  I do the same with you get more information here.
[08:27.000 --> 08:35.000]  You can actually go to the TED website to see the notice that analyzed this.
[08:35.000 --> 08:41.000]  I'm searching for a short question.
[08:41.000 --> 08:46.000]  You search now for Microsoft, usually they work with like these server providers.
[08:46.000 --> 08:57.000]  Can we get back to the challenges that we face that you can overcome?
[08:57.000 --> 09:05.000]  So here I do the same with the Polish order authority.
[09:05.000 --> 09:09.000]  Here it's more like who there's an entity buys from over the past two and a half years.
[09:09.000 --> 09:15.000]  You can see what kind of like fence and weapon and ammunition stuff they bought.
[09:15.000 --> 09:22.000]  I'll get through this because this is actually another problem that I'll talk about towards the end of the talk.
[09:22.000 --> 09:24.000]  It's deduplication.
[09:24.000 --> 09:30.000]  So in TED data, as it's published in these Excel files, there's no deduplication of entities at all.
[09:30.000 --> 09:35.000]  So you can have Microsoft Deutschland, DMPH, Microsoft Deutschland, just Microsoft, whatever that is.
[09:35.000 --> 09:40.000]  And like you can see here, Microsoft Ireland, like there's all these different.
[09:40.000 --> 09:43.000]  So I did some very naive deduplication attempt.
[09:43.000 --> 09:48.000]  I also put that data in a new project graph, but there's much more to be done on that front.
[09:48.000 --> 09:51.000]  And it's a very interesting problem, I think.
[09:51.000 --> 09:56.000]  Also because you need to think about it from a policy side as well.
[09:56.000 --> 10:00.000]  Like is Microsoft Deutschland a different entity from Microsoft Ireland?
[10:00.000 --> 10:04.000]  And if yes, what does that mean for my data analysis?
[10:04.000 --> 10:05.000]  Should I analyze them together?
[10:05.000 --> 10:07.000]  Because they're really operating as one entity.
[10:07.000 --> 10:12.000]  So they're interesting questions connected to this that are not only technical.
[10:12.000 --> 10:24.000]  So let's go back to my...
[10:24.000 --> 10:29.000]  So that was obviously limited in scope, because it was really ad hoc.
[10:29.000 --> 10:34.000]  It was quickly made, and there were lots of problems with how I parked the stage up for this deduplication.
[10:34.000 --> 10:40.000]  So now we're at the stage where there's actually a lot of interest in the FST doing this.
[10:40.000 --> 10:48.000]  I've heard from a lot of people that they would be interested in analyzing this data
[10:48.000 --> 10:51.000]  and being able to explore this data.
[10:51.000 --> 10:57.000]  So what's next and what's already implemented?
[10:57.000 --> 11:04.000]  So there's the open contracting data standard, which is something that actually came after.
[11:04.000 --> 11:10.000]  TET was first published, so I told you already TET was first published in 2015.
[11:10.000 --> 11:14.000]  I think the OCDS started being developed around 2018, 2019, something like that.
[11:14.000 --> 11:21.000]  And if you now build any kind of public procurement platform, you use this data standard.
[11:21.000 --> 11:23.000]  Because it's just a very nice way.
[11:23.000 --> 11:29.000]  People have put a lot of thought into how can we display this entire process of public procurement?
[11:29.000 --> 11:33.000]  How can we put this neatly into a data structure?
[11:33.000 --> 11:39.000]  And so now we're building TET with this data structure at its core.
[11:39.000 --> 11:52.000]  And the first task will be to parse this TET XNL jungle into this nicely specified OCDS.
[11:52.000 --> 11:58.000]  So I built a relational database that roughly captures OCDS.
[11:58.000 --> 12:05.000]  You see a lot of JCP because some things I didn't do for many to many or many to one,
[12:05.000 --> 12:09.000]  but JCP for now makes it much, much easier.
[12:09.000 --> 12:14.000]  Otherwise this table would not have been presentable.
[12:14.000 --> 12:17.000]  And now, this is the graph system after all.
[12:17.000 --> 12:22.000]  The next question, because I think analyzing this data, analyzing public procurement data,
[12:22.000 --> 12:25.000]  analyzing these relationships between business and government,
[12:25.000 --> 12:32.000]  is probably really lends itself to being encapsulated in the graph database.
[12:32.000 --> 12:37.000]  So this is really the core of OCDS that's interesting,
[12:37.000 --> 12:43.000]  and that would be interesting to model in a graph database like Neo4j.
[12:43.000 --> 12:45.000]  You have this tender.
[12:45.000 --> 12:50.000]  A tender is basically a company says, like we thought,
[12:50.000 --> 12:55.000]  like a public entity says we want to buy X or Y amount.
[12:55.000 --> 12:59.000]  And then an organization, another organization can apply for that.
[12:59.000 --> 13:03.000]  They're usually like something commercial.
[13:03.000 --> 13:11.000]  They say, look, we can furnish this tender, like we apply for this tender.
[13:11.000 --> 13:14.000]  And that's interesting data, you know,
[13:14.000 --> 13:17.000]  so who applies for which tender and which regions and stuff like that.
[13:17.000 --> 13:23.000]  And then there's awards. That's basically who gets the contract after all.
[13:23.000 --> 13:27.000]  And so that would be a very simple place to start with the graph database,
[13:27.000 --> 13:33.000]  to just have this, have all the test data going back from F15 Parcet into OCDS,
[13:33.000 --> 13:38.000]  and then take this subset of what's really central and put it into the graph database
[13:38.000 --> 13:46.000]  and really start exploring this visually and that's what we want to do.
[13:46.000 --> 13:51.000]  And part of it is already done, so I'm currently working,
[13:51.000 --> 13:57.000]  we are currently working on parsing this data, this XML.
[13:57.000 --> 14:00.000]  We use LXML library for that, which is really nice.
[14:00.000 --> 14:05.000]  And I've parsed this into a relational database,
[14:05.000 --> 14:10.000]  and I specify the OCDS data schema with SQL model,
[14:10.000 --> 14:13.000]  which is really cool for the library.
[14:13.000 --> 14:17.000]  It basically gives you identity models and SQL Openly models in one entity.
[14:17.000 --> 14:21.000]  It's really cool. It's really nice to work with.
[14:21.000 --> 14:27.000]  And then I want to create like a CSV export to be then able to input that data in Neo4j,
[14:27.000 --> 14:32.000]  put fast API, and scaffolding around that,
[14:32.000 --> 14:37.000]  and then also build some UI, which we are currently researching, which framework to use,
[14:37.000 --> 14:42.000]  and I'm also here to find out which one would be the coolest one, so I'll stay here,
[14:42.000 --> 14:46.000]  because I think there will be some problems in Neo4j's data.
[14:46.000 --> 14:50.000]  Yeah, but there's also React Force Graph, and yeah, really like the nice UI
[14:50.000 --> 14:58.000]  that's specifically geared towards that use case of analyzing public procurement data.
[14:58.000 --> 15:02.000]  And yet, I had that back and back by these two, like the relational database
[15:02.000 --> 15:08.000]  and the Neo4j database that choose, depending on the query which data sources you
[15:08.000 --> 15:12.000]  actually use.
[15:12.000 --> 15:17.000]  I'll go through the rest really quickly, but this is, if you want to get on-boarders,
[15:17.000 --> 15:20.000]  the documentation is still up around the edges.
[15:20.000 --> 15:25.000]  I'll do my best in the next days and weeks to really make the project approachable
[15:25.000 --> 15:27.000]  to the developers.
[15:27.000 --> 15:29.000]  The plan is interesting.
[15:29.000 --> 15:35.000]  I want to work with you and the CSV on this.
[15:35.000 --> 15:41.000]  So, some key characteristics that we want to really kind of put a focus on with
[15:41.000 --> 15:44.000]  Detective is that it's, yeah, it must be free software.
[15:44.000 --> 15:46.000]  It's reuse-compliant.
[15:46.000 --> 15:52.000]  It means that every file has the license header and the copyright header,
[15:52.000 --> 15:56.000]  so that it can really be easily used.
[15:56.000 --> 16:01.000]  And we want to make it for the people, so like a lot of my work in the next weeks
[16:01.000 --> 16:07.000]  will also include speaking to people who analyze programming data
[16:07.000 --> 16:12.000]  and ask them what kind of queries they would, what kind of questions they would like to ask,
[16:12.000 --> 16:16.000]  because that's really important for the design of the system that you use.
[16:16.000 --> 16:21.000]  Ask people that are later going to use it, like, how could this be helpful?
[16:21.000 --> 16:25.000]  We have done some of that, but we will do way more of that,
[16:25.000 --> 16:30.000]  especially now because we start building the UI.
[16:30.000 --> 16:35.000]  And we want it to be interoperable, so everything that Detective uses,
[16:35.000 --> 16:42.000]  every data that it uses will be also published under the CC5 4.0 license,
[16:42.000 --> 16:47.000]  and there will be open API interface, so that will be completely available.
[16:47.000 --> 16:53.000]  Obviously, some limits gets too crazy, but we'll think about that when the problem arrives.
[16:53.000 --> 16:56.000]  And also, we fundamentally believe that link data is more interesting,
[16:56.000 --> 17:01.000]  because once you have this data in the OCS format, you can start linking it
[17:01.000 --> 17:04.000]  with other data sources, right, or if you haven't already graphed it,
[17:04.000 --> 17:08.000]  you can start linking it with other data sources.
[17:08.000 --> 17:12.000]  Like, things that come to mind would be open corporate data,
[17:12.000 --> 17:16.000]  where you can really enrich the data that you have in organizations
[17:16.000 --> 17:20.000]  with data that's in this public database of corporate entities.
[17:20.000 --> 17:24.000]  Open sanctions would then allow you to flag people or companies
[17:24.000 --> 17:29.000]  that are on some sanction list, and stuff like the offshore leaks database
[17:29.000 --> 17:32.000]  would allow you to highlight things to offshore companies and stuff like that.
[17:32.000 --> 17:36.000]  That's of interest for your analysis.
[17:36.000 --> 17:40.000]  So this would be a future possibility that I'm really excited about,
[17:40.000 --> 17:45.000]  but the first step is obviously to get this into a nice format,
[17:45.000 --> 17:49.000]  and then think about extending it.
[17:49.000 --> 17:55.000]  Some of the challenges is between this step data, because some of it's quite old,
[17:55.000 --> 17:58.000]  like if you look at data that was published in 2015,
[17:58.000 --> 18:02.000]  and it's just, there's a lot of tables there,
[18:02.000 --> 18:06.000]  and there's these huge XML files that didn't currently do much validation
[18:06.000 --> 18:10.000]  on the forums that were used to take input this data,
[18:10.000 --> 18:16.000]  so it's in some places very messy.
[18:16.000 --> 18:20.000]  And also the S helps a lot actually with starting the session,
[18:20.000 --> 18:22.000]  because it's a very well-defined standard,
[18:22.000 --> 18:24.000]  and there's people like the mapping from S to S,
[18:24.000 --> 18:27.000]  and some people have published, so it's pretty cool.
[18:27.000 --> 18:31.000]  And then the next big problem that we would be helped with
[18:31.000 --> 18:36.000]  is duplication of problem entities,
[18:36.000 --> 18:41.000]  which are already kind of online-inning,
[18:41.000 --> 18:43.000]  and they are very cool.
[18:43.000 --> 18:47.000]  So we do have a good idea of that as they contribute,
[18:47.000 --> 18:53.000]  because I think that's really central to taking it being helpful.
[18:53.000 --> 18:55.000]  So how can you get involved?
[18:55.000 --> 18:58.000]  All the code is on our get instance.
[18:58.000 --> 19:02.000]  At the moment, you can only really contribute PR issues
[19:02.000 --> 19:06.000]  if you make an account, and I'll get this free.
[19:06.000 --> 19:11.000]  It's just a couple of weeks, but that's for now if we,
[19:11.000 --> 19:14.000]  if somebody manages that,
[19:14.000 --> 19:17.000]  then we'll think about mirroring GitHub,
[19:17.000 --> 19:19.000]  but let's try this first.
[19:19.000 --> 19:24.000]  Maybe there's a federation coming for the Git forges,
[19:24.000 --> 19:27.000]  not there yet, as I understand.
[19:27.000 --> 19:29.000]  There's also websites with the documentation,
[19:29.000 --> 19:31.000]  and then you can also write an e-mail to,
[19:31.000 --> 19:36.000]  this will reach always the maintainers.
[19:36.000 --> 19:38.000]  Yeah, and I'm looking forward to your question.
[19:38.000 --> 19:40.000]  Thank you very much.
[19:40.000 --> 19:42.000]  Thank you.
[19:50.000 --> 19:56.000]  Regarding funding, did you try to contact the official European institutions
[19:56.000 --> 19:58.000]  so that you can have funding for this slide,
[19:58.000 --> 20:02.000]  and so that it becomes like the default slide for that in Europe?
[20:02.000 --> 20:04.000]  So I know that...
[20:04.000 --> 20:06.000]  Ah, yeah.
[20:06.000 --> 20:10.000]  So the question was whether we asked the Publications Office
[20:10.000 --> 20:12.000]  for funding for this.
[20:12.000 --> 20:15.000]  Not specifically yet.
[20:15.000 --> 20:18.000]  I know that they are working themselves on a huge reform
[20:18.000 --> 20:20.000]  of the entire ecosystem,
[20:20.000 --> 20:23.000]  so they do this, what they call e-forms now,
[20:23.000 --> 20:27.000]  which is supposed to substitute what used to be TED,
[20:27.000 --> 20:30.000]  but e-forms still isn't most yet,
[20:30.000 --> 20:32.000]  there's discussions around that,
[20:32.000 --> 20:34.000]  but I don't fully understand all the time,
[20:34.000 --> 20:37.000]  and they're also rebuilding the TED website.
[20:37.000 --> 20:39.000]  We should get the compact for them.
[20:39.000 --> 20:43.000]  I have the compacts because we want this data fund,
[20:43.000 --> 20:46.000]  and we have the technical contact there,
[20:46.000 --> 20:48.000]  and we should make use of it,
[20:48.000 --> 20:52.000]  but I was really that keen to code the past couple of weeks,
[20:52.000 --> 20:56.000]  but this would certainly be very helpful to reach out to them.
[20:56.000 --> 20:59.000]  Absolutely, and this will happen.
[20:59.000 --> 21:04.000]  And we already got some funding because we want this data fund.
[21:04.000 --> 21:07.000]  We'll use this.
[21:07.000 --> 21:11.000]  So the data that's currently produced for publishers
[21:11.000 --> 21:18.000]  is it still some TED or is it also called NOS-DS?
[21:18.000 --> 21:21.000]  It will be all NOS-DS format.
[21:21.000 --> 21:25.000]  Honestly, I don't think anything else makes sense.
[21:25.000 --> 21:30.000]  So it's just a whole data that we will republish as NOS-DS.
[21:30.000 --> 21:33.000]  There's some place like OpenTenator.q,
[21:33.000 --> 21:35.000]  which was a component project,
[21:35.000 --> 21:40.000]  which also does this republishing of the NOS-DS as NOS-DS,
[21:40.000 --> 21:44.000]  but it's not consistent in how it's regularly
[21:44.000 --> 21:47.000]  and how it updates its database.
[21:47.000 --> 21:53.000]  It doesn't seem very actively connected.
[21:53.000 --> 21:55.000]  I got a question.
[21:55.000 --> 21:58.000]  When you look at these centers and companies involved,
[21:58.000 --> 22:01.000]  are you also able to extract what the ActionTender is about?
[22:01.000 --> 22:03.000]  So is there an underlying structure?
[22:03.000 --> 22:06.000]  This is about, I don't know, classroom furniture,
[22:06.000 --> 22:08.000]  and this is about military equipment
[22:08.000 --> 22:10.000]  so that you kind of can coordinate
[22:10.000 --> 22:15.000]  both by item or by contract product?
[22:15.000 --> 22:16.000]  Yes.
[22:16.000 --> 22:18.000]  So shall I repeat the question?
[22:18.000 --> 22:19.000]  Yes.
[22:19.000 --> 22:21.000]  So the question was whether there's also data
[22:21.000 --> 22:24.000]  on what has been procured and details about
[22:24.000 --> 22:27.000]  what was being procured by a public institution.
[22:27.000 --> 22:28.000]  And the answer is yes.
[22:28.000 --> 22:32.000]  There's usually a title that's fairly descriptive.
[22:32.000 --> 22:33.000]  And a description.
[22:33.000 --> 22:38.000]  Sometimes an usage, sometimes another usage.
[22:38.000 --> 22:40.000]  And then there's CPV codes,
[22:40.000 --> 22:43.000]  which is more like a common procurement vocabulary
[22:43.000 --> 22:48.000]  that specifies what kind of category this procurement is in.
[22:48.000 --> 22:52.000]  But some stuff is excluded by this legislation.
[22:52.000 --> 22:55.000]  For example, like military equipment.
[22:55.000 --> 22:57.000]  It's not published in the state.
[22:57.000 --> 22:58.000]  It's not open.
[22:58.000 --> 23:00.000]  That's why we can't talk about open procurement
[23:00.000 --> 23:02.000]  in good context yet, because there's still
[23:02.000 --> 23:09.000]  lots of sensitive data that's not being included in that.
[23:09.000 --> 23:11.000]  Do you plan to host it publicly?
[23:11.000 --> 23:12.000]  Yes.
[23:12.000 --> 23:13.000]  We plan to host it publicly.
[23:13.000 --> 23:16.000]  Yes, absolutely.
[23:16.000 --> 23:18.000]  It's just at the moment that the API is down
[23:18.000 --> 23:20.000]  because I've retracted so many things.
[23:20.000 --> 23:25.000]  But it will be off again.
[23:25.000 --> 23:27.000]  Of course it will be publicly available,
[23:27.000 --> 23:29.000]  but if everything crashes,
[23:29.000 --> 23:30.000]  because there's so much interest in it,
[23:30.000 --> 23:33.000]  then we'll think about limiting it somehow.
[23:33.000 --> 23:35.000]  But there's a sister from there.
[23:35.000 --> 23:38.000]  Exactly, yes.
[23:38.000 --> 23:39.000]  So we'll see.
[23:39.000 --> 23:45.000]  There's really that much interest in it.
[23:45.000 --> 23:58.000]  So what was the biggest challenge in cleaning the data?
[23:58.000 --> 24:03.000]  So I would say one is just finding,
[24:03.000 --> 24:05.000]  if there isn't English translation available,
[24:05.000 --> 24:07.000]  finding that for the specific,
[24:07.000 --> 24:10.000]  because we really lay out layout in this text well.
[24:10.000 --> 24:14.000]  Whereas if a translation exists,
[24:14.000 --> 24:16.000]  where is it next in our time?
[24:16.000 --> 24:21.000]  What does it apply to?
[24:21.000 --> 24:28.000]  Another one was languages that I didn't know
[24:28.000 --> 24:32.000]  the alphabet of for the hard to parse.
[24:32.000 --> 24:34.000]  Yes, I just generalize company names
[24:34.000 --> 24:37.000]  that they didn't have for a long time.
[24:37.000 --> 24:42.000]  I mean any validation on what you could put in there,
[24:42.000 --> 24:44.000]  which makes it really hard.
[24:44.000 --> 24:47.000]  And it would have been very easy to implement upstream,
[24:47.000 --> 25:13.000]  but now it's because of the sounds.
[25:13.000 --> 25:18.000]  Thank you.