[00:00.000 --> 00:12.000] I'm Daniel from the two-star platform in Europe, and I'm sitting over there, and I also work [00:12.000 --> 00:19.000] with other projects like Detective, which we want to be an open-source solution to make [00:19.000 --> 00:25.000] European public tendering data or public procurement data explorable for people who don't know [00:25.000 --> 00:31.000] that much about the procurement data. So I want to do a couple of things in this talk. [00:31.000 --> 00:39.000] First, I want to describe why public procurement data is interesting, why we should take a look at it, [00:39.000 --> 00:48.000] and I want to discuss some problems of how this data in EU context is currently accessible. [00:48.000 --> 00:54.000] And then I want to show you our project of alleviating some of these problems with Detective. [00:54.000 --> 00:59.000] And then I want to show you how you can actually contribute to the project with your company. [00:59.000 --> 01:07.000] Still very much in the early stages, just getting going, and we love the opportunity to show this now [01:07.000 --> 01:16.000] so we can actually contribute even in the earlier phase of the project. [01:16.000 --> 01:24.000] So what's TET? TET's in the name, and what's TET? So TET stands for Tenders European Daily, [01:24.000 --> 01:32.000] and it's basically a data set that's published by the EU Publications Office, [01:32.000 --> 01:39.000] and they've published this data for a long time. They've been publishing this for a long time since 2015, [01:39.000 --> 01:49.000] actually, they've been providing this freely on the internet, and it's data about basically who buys what from whom, [01:49.000 --> 01:54.000] like which public institutions in the EU buy what for what price from which organization. [01:54.000 --> 02:00.000] So it's really data about the relationship between business and government. [02:00.000 --> 02:06.000] And if, so for example your local school or some ministry in your country in the EU [02:06.000 --> 02:11.000] wants to buy something that's of a certain threshold, they're defined in the EU legislation, [02:11.000 --> 02:15.000] you can look them up in the link here, I will upload the slides upwards. [02:15.000 --> 02:25.000] It needs to go into TET, and it will be in this data set, and there's at least 670 billion per year in value [02:25.000 --> 02:35.000] that's kind of encapsulated in this data, and there's more than 700,000 notices that they publish each year. [02:35.000 --> 02:43.000] They've described this entire process of public procurement in the EU. [02:43.000 --> 02:51.000] It's very great that some of you want to join. [02:51.000 --> 02:58.000] So you put things, well great, you publish it, so what's the problem with that? [02:58.000 --> 03:06.000] I mean the way this data is made accessible is via this UI, one funny thing is, [03:06.000 --> 03:11.000] one funny thing is, this button for statistics mode, I still haven't found out what that does, [03:11.000 --> 03:16.000] like what that changes, maybe somebody from the EU can illuminate, [03:16.000 --> 03:21.000] but basically you have to really know what you're searching for in the first place [03:21.000 --> 03:23.000] in order to be able to use this kind of interface. [03:23.000 --> 03:29.000] And there's also a lot of other problems with accessing this data. [03:29.000 --> 03:33.000] For example, you can't really search by organization, which would be interesting. [03:33.000 --> 03:38.000] I mean it's about the relationship between government and business in all of the money terms. [03:38.000 --> 03:43.000] So why is there no option to search for organizations that I'm interested in? [03:43.000 --> 03:49.000] I can only really do a full text search over these huge XML files, which are really complex. [03:49.000 --> 03:53.000] And I can do some other stuff, but there's no type of tolerance, for example, [03:53.000 --> 03:58.000] none of the really nice search features that we can use to. [03:58.000 --> 04:05.000] And most importantly, there's no ability at all to readily visualize the results that I get. [04:05.000 --> 04:12.000] Like if I type something in here, in a search mask, I get back a list of HTML, [04:12.000 --> 04:18.000] basically just an HTML list of notices, then I need to understand what's a notice [04:18.000 --> 04:20.000] or the different types of notices that I'm interested in. [04:20.000 --> 04:23.000] So it's really hard. [04:23.000 --> 04:28.000] So it makes the test right, because accessibility is really bad with this data. [04:28.000 --> 04:30.000] So why is detective needed? [04:30.000 --> 04:34.000] In the past, there have been a number of attempts to look at this data [04:34.000 --> 04:40.000] and transform it into a more manageable or readily analyzable format. [04:40.000 --> 04:46.000] And we weren't really able to identify a single, freely available solution [04:46.000 --> 04:51.000] that was published under a free software license that allows you to explore this data [04:51.000 --> 04:55.000] even if you don't have domain expertise or data science. [04:55.000 --> 05:01.000] And you kind of need both now to be able to make some sense of this data. [05:01.000 --> 05:03.000] And we thought this would be interesting. [05:03.000 --> 05:06.000] So why isn't this more readily available? [05:06.000 --> 05:12.000] So we applied to last year's EU Datathon with this idea, basically, to make this data more accessible. [05:12.000 --> 05:14.000] And this is what we told them. [05:14.000 --> 05:19.000] So we have any type of, let's say we have a public servant [05:19.000 --> 05:25.000] that wants to find out who buys what from, like, within their state. [05:25.000 --> 05:28.000] Who buys from Microsoft, in Germany. [05:28.000 --> 05:34.000] And how much they spend on free software from this company. [05:34.000 --> 05:40.000] And yes, maybe make the case of how much they can save if they use free software instead. [05:40.000 --> 05:48.000] Or let's say you're a journalist who wants to investigate recent purchases made by Microsoft. [05:48.000 --> 05:50.000] Or authority. [05:50.000 --> 05:55.000] You could do that now with a patent to face, but it would be very, very difficult. [05:55.000 --> 05:58.000] And you'd have to jump a lot of hurdles to get there. [05:58.000 --> 06:03.000] So we want to take it as to be an application that you use [06:03.000 --> 06:07.000] which lowers the barrier of entry to analyze. [06:07.000 --> 06:15.000] So we thought let's present the publications of this concept with free software. [06:15.000 --> 06:17.000] And keeping it very simple. [06:17.000 --> 06:21.000] So we built something roughly with this architecture. [06:21.000 --> 06:23.000] So you have this XML file. [06:23.000 --> 06:28.000] And this was very quickly built just for this Datathon. [06:28.000 --> 06:30.000] So I'll go through it quickly. [06:30.000 --> 06:32.000] So we had this XML file. [06:32.000 --> 06:36.000] I transformed it to JSON for whatever reason, which was a very bad idea. [06:36.000 --> 06:41.000] And I parsed it in Python and put it in some ad hoc schema in Postgres. [06:41.000 --> 06:46.000] And then I used the Neo4j ETL tool to put it to a Neo4j database. [06:46.000 --> 06:53.000] The data I was interested in was relational data between, and it shows the relationship between business and government. [06:53.000 --> 06:55.000] And then I used Neo-dash to visualize that. [06:55.000 --> 07:06.000] And that actually already gave people at PUD some chance to see what might be possible with you if you open up this data. [07:06.000 --> 07:12.000] So I'll show you the little demo of how that looked. [07:12.000 --> 07:14.000] So basically this is just an overview. [07:14.000 --> 07:17.000] I parsed data for roughly three years or two and a half years. [07:17.000 --> 07:21.000] This shows you the activity per country. [07:21.000 --> 07:26.000] This is just some general overviews, like roughly a million tenders. [07:26.000 --> 07:30.000] And then it's not optimized yet. [07:30.000 --> 07:35.000] You basically search for Microsoft Germany and then you have this graph. [07:35.000 --> 07:40.000] You have a geographical distribution of commercial activity that's related to Microsoft. [07:40.000 --> 07:50.000] And you get this nice graph of relationships between Microsoft Germany here in the center as an entity. [07:50.000 --> 07:53.000] And then the yellow or red ones are tenders. [07:53.000 --> 08:00.000] So here they sold something to some institution of German government in this case here. [08:00.000 --> 08:05.000] Mostly because Microsoft Germany mostly sells to German government. [08:05.000 --> 08:10.000] And the red ones are tenders above one million euro. [08:10.000 --> 08:21.000] And that gave you a very quick overview of the commercial activity and the relationship between government entities and business entities. [08:21.000 --> 08:27.000] I do the same with you get more information here. [08:27.000 --> 08:35.000] You can actually go to the TED website to see the notice that analyzed this. [08:35.000 --> 08:41.000] I'm searching for a short question. [08:41.000 --> 08:46.000] You search now for Microsoft, usually they work with like these server providers. [08:46.000 --> 08:57.000] Can we get back to the challenges that we face that you can overcome? [08:57.000 --> 09:05.000] So here I do the same with the Polish order authority. [09:05.000 --> 09:09.000] Here it's more like who there's an entity buys from over the past two and a half years. [09:09.000 --> 09:15.000] You can see what kind of like fence and weapon and ammunition stuff they bought. [09:15.000 --> 09:22.000] I'll get through this because this is actually another problem that I'll talk about towards the end of the talk. [09:22.000 --> 09:24.000] It's deduplication. [09:24.000 --> 09:30.000] So in TED data, as it's published in these Excel files, there's no deduplication of entities at all. [09:30.000 --> 09:35.000] So you can have Microsoft Deutschland, DMPH, Microsoft Deutschland, just Microsoft, whatever that is. [09:35.000 --> 09:40.000] And like you can see here, Microsoft Ireland, like there's all these different. [09:40.000 --> 09:43.000] So I did some very naive deduplication attempt. [09:43.000 --> 09:48.000] I also put that data in a new project graph, but there's much more to be done on that front. [09:48.000 --> 09:51.000] And it's a very interesting problem, I think. [09:51.000 --> 09:56.000] Also because you need to think about it from a policy side as well. [09:56.000 --> 10:00.000] Like is Microsoft Deutschland a different entity from Microsoft Ireland? [10:00.000 --> 10:04.000] And if yes, what does that mean for my data analysis? [10:04.000 --> 10:05.000] Should I analyze them together? [10:05.000 --> 10:07.000] Because they're really operating as one entity. [10:07.000 --> 10:12.000] So they're interesting questions connected to this that are not only technical. [10:12.000 --> 10:24.000] So let's go back to my... [10:24.000 --> 10:29.000] So that was obviously limited in scope, because it was really ad hoc. [10:29.000 --> 10:34.000] It was quickly made, and there were lots of problems with how I parked the stage up for this deduplication. [10:34.000 --> 10:40.000] So now we're at the stage where there's actually a lot of interest in the FST doing this. [10:40.000 --> 10:48.000] I've heard from a lot of people that they would be interested in analyzing this data [10:48.000 --> 10:51.000] and being able to explore this data. [10:51.000 --> 10:57.000] So what's next and what's already implemented? [10:57.000 --> 11:04.000] So there's the open contracting data standard, which is something that actually came after. [11:04.000 --> 11:10.000] TET was first published, so I told you already TET was first published in 2015. [11:10.000 --> 11:14.000] I think the OCDS started being developed around 2018, 2019, something like that. [11:14.000 --> 11:21.000] And if you now build any kind of public procurement platform, you use this data standard. [11:21.000 --> 11:23.000] Because it's just a very nice way. [11:23.000 --> 11:29.000] People have put a lot of thought into how can we display this entire process of public procurement? [11:29.000 --> 11:33.000] How can we put this neatly into a data structure? [11:33.000 --> 11:39.000] And so now we're building TET with this data structure at its core. [11:39.000 --> 11:52.000] And the first task will be to parse this TET XNL jungle into this nicely specified OCDS. [11:52.000 --> 11:58.000] So I built a relational database that roughly captures OCDS. [11:58.000 --> 12:05.000] You see a lot of JCP because some things I didn't do for many to many or many to one, [12:05.000 --> 12:09.000] but JCP for now makes it much, much easier. [12:09.000 --> 12:14.000] Otherwise this table would not have been presentable. [12:14.000 --> 12:17.000] And now, this is the graph system after all. [12:17.000 --> 12:22.000] The next question, because I think analyzing this data, analyzing public procurement data, [12:22.000 --> 12:25.000] analyzing these relationships between business and government, [12:25.000 --> 12:32.000] is probably really lends itself to being encapsulated in the graph database. [12:32.000 --> 12:37.000] So this is really the core of OCDS that's interesting, [12:37.000 --> 12:43.000] and that would be interesting to model in a graph database like Neo4j. [12:43.000 --> 12:45.000] You have this tender. [12:45.000 --> 12:50.000] A tender is basically a company says, like we thought, [12:50.000 --> 12:55.000] like a public entity says we want to buy X or Y amount. [12:55.000 --> 12:59.000] And then an organization, another organization can apply for that. [12:59.000 --> 13:03.000] They're usually like something commercial. [13:03.000 --> 13:11.000] They say, look, we can furnish this tender, like we apply for this tender. [13:11.000 --> 13:14.000] And that's interesting data, you know, [13:14.000 --> 13:17.000] so who applies for which tender and which regions and stuff like that. [13:17.000 --> 13:23.000] And then there's awards. That's basically who gets the contract after all. [13:23.000 --> 13:27.000] And so that would be a very simple place to start with the graph database, [13:27.000 --> 13:33.000] to just have this, have all the test data going back from F15 Parcet into OCDS, [13:33.000 --> 13:38.000] and then take this subset of what's really central and put it into the graph database [13:38.000 --> 13:46.000] and really start exploring this visually and that's what we want to do. [13:46.000 --> 13:51.000] And part of it is already done, so I'm currently working, [13:51.000 --> 13:57.000] we are currently working on parsing this data, this XML. [13:57.000 --> 14:00.000] We use LXML library for that, which is really nice. [14:00.000 --> 14:05.000] And I've parsed this into a relational database, [14:05.000 --> 14:10.000] and I specify the OCDS data schema with SQL model, [14:10.000 --> 14:13.000] which is really cool for the library. [14:13.000 --> 14:17.000] It basically gives you identity models and SQL Openly models in one entity. [14:17.000 --> 14:21.000] It's really cool. It's really nice to work with. [14:21.000 --> 14:27.000] And then I want to create like a CSV export to be then able to input that data in Neo4j, [14:27.000 --> 14:32.000] put fast API, and scaffolding around that, [14:32.000 --> 14:37.000] and then also build some UI, which we are currently researching, which framework to use, [14:37.000 --> 14:42.000] and I'm also here to find out which one would be the coolest one, so I'll stay here, [14:42.000 --> 14:46.000] because I think there will be some problems in Neo4j's data. [14:46.000 --> 14:50.000] Yeah, but there's also React Force Graph, and yeah, really like the nice UI [14:50.000 --> 14:58.000] that's specifically geared towards that use case of analyzing public procurement data. [14:58.000 --> 15:02.000] And yet, I had that back and back by these two, like the relational database [15:02.000 --> 15:08.000] and the Neo4j database that choose, depending on the query which data sources you [15:08.000 --> 15:12.000] actually use. [15:12.000 --> 15:17.000] I'll go through the rest really quickly, but this is, if you want to get on-boarders, [15:17.000 --> 15:20.000] the documentation is still up around the edges. [15:20.000 --> 15:25.000] I'll do my best in the next days and weeks to really make the project approachable [15:25.000 --> 15:27.000] to the developers. [15:27.000 --> 15:29.000] The plan is interesting. [15:29.000 --> 15:35.000] I want to work with you and the CSV on this. [15:35.000 --> 15:41.000] So, some key characteristics that we want to really kind of put a focus on with [15:41.000 --> 15:44.000] Detective is that it's, yeah, it must be free software. [15:44.000 --> 15:46.000] It's reuse-compliant. [15:46.000 --> 15:52.000] It means that every file has the license header and the copyright header, [15:52.000 --> 15:56.000] so that it can really be easily used. [15:56.000 --> 16:01.000] And we want to make it for the people, so like a lot of my work in the next weeks [16:01.000 --> 16:07.000] will also include speaking to people who analyze programming data [16:07.000 --> 16:12.000] and ask them what kind of queries they would, what kind of questions they would like to ask, [16:12.000 --> 16:16.000] because that's really important for the design of the system that you use. [16:16.000 --> 16:21.000] Ask people that are later going to use it, like, how could this be helpful? [16:21.000 --> 16:25.000] We have done some of that, but we will do way more of that, [16:25.000 --> 16:30.000] especially now because we start building the UI. [16:30.000 --> 16:35.000] And we want it to be interoperable, so everything that Detective uses, [16:35.000 --> 16:42.000] every data that it uses will be also published under the CC5 4.0 license, [16:42.000 --> 16:47.000] and there will be open API interface, so that will be completely available. [16:47.000 --> 16:53.000] Obviously, some limits gets too crazy, but we'll think about that when the problem arrives. [16:53.000 --> 16:56.000] And also, we fundamentally believe that link data is more interesting, [16:56.000 --> 17:01.000] because once you have this data in the OCS format, you can start linking it [17:01.000 --> 17:04.000] with other data sources, right, or if you haven't already graphed it, [17:04.000 --> 17:08.000] you can start linking it with other data sources. [17:08.000 --> 17:12.000] Like, things that come to mind would be open corporate data, [17:12.000 --> 17:16.000] where you can really enrich the data that you have in organizations [17:16.000 --> 17:20.000] with data that's in this public database of corporate entities. [17:20.000 --> 17:24.000] Open sanctions would then allow you to flag people or companies [17:24.000 --> 17:29.000] that are on some sanction list, and stuff like the offshore leaks database [17:29.000 --> 17:32.000] would allow you to highlight things to offshore companies and stuff like that. [17:32.000 --> 17:36.000] That's of interest for your analysis. [17:36.000 --> 17:40.000] So this would be a future possibility that I'm really excited about, [17:40.000 --> 17:45.000] but the first step is obviously to get this into a nice format, [17:45.000 --> 17:49.000] and then think about extending it. [17:49.000 --> 17:55.000] Some of the challenges is between this step data, because some of it's quite old, [17:55.000 --> 17:58.000] like if you look at data that was published in 2015, [17:58.000 --> 18:02.000] and it's just, there's a lot of tables there, [18:02.000 --> 18:06.000] and there's these huge XML files that didn't currently do much validation [18:06.000 --> 18:10.000] on the forums that were used to take input this data, [18:10.000 --> 18:16.000] so it's in some places very messy. [18:16.000 --> 18:20.000] And also the S helps a lot actually with starting the session, [18:20.000 --> 18:22.000] because it's a very well-defined standard, [18:22.000 --> 18:24.000] and there's people like the mapping from S to S, [18:24.000 --> 18:27.000] and some people have published, so it's pretty cool. [18:27.000 --> 18:31.000] And then the next big problem that we would be helped with [18:31.000 --> 18:36.000] is duplication of problem entities, [18:36.000 --> 18:41.000] which are already kind of online-inning, [18:41.000 --> 18:43.000] and they are very cool. [18:43.000 --> 18:47.000] So we do have a good idea of that as they contribute, [18:47.000 --> 18:53.000] because I think that's really central to taking it being helpful. [18:53.000 --> 18:55.000] So how can you get involved? [18:55.000 --> 18:58.000] All the code is on our get instance. [18:58.000 --> 19:02.000] At the moment, you can only really contribute PR issues [19:02.000 --> 19:06.000] if you make an account, and I'll get this free. [19:06.000 --> 19:11.000] It's just a couple of weeks, but that's for now if we, [19:11.000 --> 19:14.000] if somebody manages that, [19:14.000 --> 19:17.000] then we'll think about mirroring GitHub, [19:17.000 --> 19:19.000] but let's try this first. [19:19.000 --> 19:24.000] Maybe there's a federation coming for the Git forges, [19:24.000 --> 19:27.000] not there yet, as I understand. [19:27.000 --> 19:29.000] There's also websites with the documentation, [19:29.000 --> 19:31.000] and then you can also write an e-mail to, [19:31.000 --> 19:36.000] this will reach always the maintainers. [19:36.000 --> 19:38.000] Yeah, and I'm looking forward to your question. [19:38.000 --> 19:40.000] Thank you very much. [19:40.000 --> 19:42.000] Thank you. [19:50.000 --> 19:56.000] Regarding funding, did you try to contact the official European institutions [19:56.000 --> 19:58.000] so that you can have funding for this slide, [19:58.000 --> 20:02.000] and so that it becomes like the default slide for that in Europe? [20:02.000 --> 20:04.000] So I know that... [20:04.000 --> 20:06.000] Ah, yeah. [20:06.000 --> 20:10.000] So the question was whether we asked the Publications Office [20:10.000 --> 20:12.000] for funding for this. [20:12.000 --> 20:15.000] Not specifically yet. [20:15.000 --> 20:18.000] I know that they are working themselves on a huge reform [20:18.000 --> 20:20.000] of the entire ecosystem, [20:20.000 --> 20:23.000] so they do this, what they call e-forms now, [20:23.000 --> 20:27.000] which is supposed to substitute what used to be TED, [20:27.000 --> 20:30.000] but e-forms still isn't most yet, [20:30.000 --> 20:32.000] there's discussions around that, [20:32.000 --> 20:34.000] but I don't fully understand all the time, [20:34.000 --> 20:37.000] and they're also rebuilding the TED website. [20:37.000 --> 20:39.000] We should get the compact for them. [20:39.000 --> 20:43.000] I have the compacts because we want this data fund, [20:43.000 --> 20:46.000] and we have the technical contact there, [20:46.000 --> 20:48.000] and we should make use of it, [20:48.000 --> 20:52.000] but I was really that keen to code the past couple of weeks, [20:52.000 --> 20:56.000] but this would certainly be very helpful to reach out to them. [20:56.000 --> 20:59.000] Absolutely, and this will happen. [20:59.000 --> 21:04.000] And we already got some funding because we want this data fund. [21:04.000 --> 21:07.000] We'll use this. [21:07.000 --> 21:11.000] So the data that's currently produced for publishers [21:11.000 --> 21:18.000] is it still some TED or is it also called NOS-DS? [21:18.000 --> 21:21.000] It will be all NOS-DS format. [21:21.000 --> 21:25.000] Honestly, I don't think anything else makes sense. [21:25.000 --> 21:30.000] So it's just a whole data that we will republish as NOS-DS. [21:30.000 --> 21:33.000] There's some place like OpenTenator.q, [21:33.000 --> 21:35.000] which was a component project, [21:35.000 --> 21:40.000] which also does this republishing of the NOS-DS as NOS-DS, [21:40.000 --> 21:44.000] but it's not consistent in how it's regularly [21:44.000 --> 21:47.000] and how it updates its database. [21:47.000 --> 21:53.000] It doesn't seem very actively connected. [21:53.000 --> 21:55.000] I got a question. [21:55.000 --> 21:58.000] When you look at these centers and companies involved, [21:58.000 --> 22:01.000] are you also able to extract what the ActionTender is about? [22:01.000 --> 22:03.000] So is there an underlying structure? [22:03.000 --> 22:06.000] This is about, I don't know, classroom furniture, [22:06.000 --> 22:08.000] and this is about military equipment [22:08.000 --> 22:10.000] so that you kind of can coordinate [22:10.000 --> 22:15.000] both by item or by contract product? [22:15.000 --> 22:16.000] Yes. [22:16.000 --> 22:18.000] So shall I repeat the question? [22:18.000 --> 22:19.000] Yes. [22:19.000 --> 22:21.000] So the question was whether there's also data [22:21.000 --> 22:24.000] on what has been procured and details about [22:24.000 --> 22:27.000] what was being procured by a public institution. [22:27.000 --> 22:28.000] And the answer is yes. [22:28.000 --> 22:32.000] There's usually a title that's fairly descriptive. [22:32.000 --> 22:33.000] And a description. [22:33.000 --> 22:38.000] Sometimes an usage, sometimes another usage. [22:38.000 --> 22:40.000] And then there's CPV codes, [22:40.000 --> 22:43.000] which is more like a common procurement vocabulary [22:43.000 --> 22:48.000] that specifies what kind of category this procurement is in. [22:48.000 --> 22:52.000] But some stuff is excluded by this legislation. [22:52.000 --> 22:55.000] For example, like military equipment. [22:55.000 --> 22:57.000] It's not published in the state. [22:57.000 --> 22:58.000] It's not open. [22:58.000 --> 23:00.000] That's why we can't talk about open procurement [23:00.000 --> 23:02.000] in good context yet, because there's still [23:02.000 --> 23:09.000] lots of sensitive data that's not being included in that. [23:09.000 --> 23:11.000] Do you plan to host it publicly? [23:11.000 --> 23:12.000] Yes. [23:12.000 --> 23:13.000] We plan to host it publicly. [23:13.000 --> 23:16.000] Yes, absolutely. [23:16.000 --> 23:18.000] It's just at the moment that the API is down [23:18.000 --> 23:20.000] because I've retracted so many things. [23:20.000 --> 23:25.000] But it will be off again. [23:25.000 --> 23:27.000] Of course it will be publicly available, [23:27.000 --> 23:29.000] but if everything crashes, [23:29.000 --> 23:30.000] because there's so much interest in it, [23:30.000 --> 23:33.000] then we'll think about limiting it somehow. [23:33.000 --> 23:35.000] But there's a sister from there. [23:35.000 --> 23:38.000] Exactly, yes. [23:38.000 --> 23:39.000] So we'll see. [23:39.000 --> 23:45.000] There's really that much interest in it. [23:45.000 --> 23:58.000] So what was the biggest challenge in cleaning the data? [23:58.000 --> 24:03.000] So I would say one is just finding, [24:03.000 --> 24:05.000] if there isn't English translation available, [24:05.000 --> 24:07.000] finding that for the specific, [24:07.000 --> 24:10.000] because we really lay out layout in this text well. [24:10.000 --> 24:14.000] Whereas if a translation exists, [24:14.000 --> 24:16.000] where is it next in our time? [24:16.000 --> 24:21.000] What does it apply to? [24:21.000 --> 24:28.000] Another one was languages that I didn't know [24:28.000 --> 24:32.000] the alphabet of for the hard to parse. [24:32.000 --> 24:34.000] Yes, I just generalize company names [24:34.000 --> 24:37.000] that they didn't have for a long time. [24:37.000 --> 24:42.000] I mean any validation on what you could put in there, [24:42.000 --> 24:44.000] which makes it really hard. [24:44.000 --> 24:47.000] And it would have been very easy to implement upstream, [24:47.000 --> 25:13.000] but now it's because of the sounds. [25:13.000 --> 25:18.000] Thank you.