Hello, everybody. I'm Anne and I'm a, yes, data engineer, software craft woman. And currently I work for the French Ministry of the Higher Education and Research in France. So it will be important for the project, for the presentation, because I'm working on the French Open Science Monitor. So before, before all, I have to say that in France we have no CRIS system, the system in the university that will reference all the publications and all the works in the, in each university or in the, in the national level. So, or at the national level. So, so, yes. And back in 2018, the, the, the policymakers decided to, to, to create the national plan for open science. So to, to promote the, the open science in France. And the, the first question was, so how, how open is the French science? So according to, so to, to answer to this question, we developed a tool to, to be able to, let's say a sovereign tool to be able to, to measure open science. And the, and the, later the requirement would be, would be not to use the proprietary database. So everything had to be open and transparent. So to first begin with the publications, which, which was the first part of the project. Yes, we, we started from there. The, the, let's say the, the, the graph on the right, proprietary bibliographic database. They're kind of complete. They have lots of metadata, but they are missing some part of the publications because they are the one that decided what publication will be in their databases. And on the other hand, the, the open bibliographic databases, they have, they are more complete, more extensive, but they are missing some metadata information or sometimes the quality of the metadata was not really perfect. So we decided to create our own tool based on this. So first we collected multiple metadata available on the, on the web, just like with Crossref specifically. And we tried to aggregate some other sources like OpenAPC, PubMed, and, and, and web crawling to, to complete the information, the, the metadata. So, and based on this, we developed a tool to be able to automatically detect the country of the affiliation in this metadata. So we, we this little hoist, it's the tool that we made. And the, the previous one, PubMed, Crossref, and HAL, we just collect and extract metadata from there. But based on this metadata, we had to build the country detection. So we, our affiliation match, which is able to detect the country of the affiliation. And we, we, we, we, we, we, we, we, we, we, we, we, we, we, we, we, we we, we, we, we, we were able to communicate. So we wanted to create, to create, to create our affiliation match, which is able to detect the country of the publication. Just to, to draw the, the perimeter of the, of the French publications. So, yeah, that's it. That's just What I said. So by, by example. So one of the examples here. So the first compromise would be to, to, to say, okay, if I detect France in the affiliation, it will be France. the affiliation, it will be French. It will be French, sorry. But then we faced this problem with Hotel de France, which is a hospital in Beirut. So, OK, first, wrong result. And then we improve our affiliation measure based on the public institutions database, just like raw. So yeah, that was our first challenge. And then we still, we keep consolidating the metadata by adding the open status based on NPWOL. So for each publication that we collected, we added, thanks to the DOI, we added the open status on it. And after that, we redo the classification to say if it's open or not, but if it's green or not, if it's bronze or not, hybrid or not, there are some different levels of, let's say, openness, according to the journal and according to the APC. So we developed a new classification open access type. And we also added, let's say, categories there, so a tool to be able to add some classification for the publication, to be able to know in which, let's say, discipline or let's say, which scientific field this publication is. So yes. And with all these metadata, we were able to build some indicators. So our first problem would be to measure the open science in France. And we know how the results, according to this methodology, I might show you something else, just like. The website with the results. So according to all the computer metadata, we know that the French publications are open in 67%. So that's the point. And thanks to the, let's say, categorization by discipline, we have some other graphics. Some other graphs, sorry. Just like being able to know the openness by discipline. So here it's about the, let's say, mathematics is the most open scientific field in France. So yeah, you can go on the website where there are more results. But no, I would like to. So yes, that's what I said. Yes, and with all these results, we asked the universities in France, if they were interested, to have their own results. So the only condition would be we asked them to send us their own perimeter. So their own list of publications that they have in their university. And then we were able to send them exactly the same graph, but adapted for the smaller perimeters. And we now have more or less 200, let's say, local variations of this monitor. So there are some universities perimeter, but some labs too. So yeah, and I just show you that. And after that, we did a second round to try to detect and to, let's say, measure the openness of data and code on the French publications too. So yes, let's try to invent another methodology. So first, we needed to collect the data. So that's just what we said before. So we have the whole publications in France. In France, sorry. We tried to download all the PDFs of the publications freely available on the internet. So in this case, there was almost no problem as long as the PDFs were still available. But for the closed one, we need to find some agreement with our partners in the project to give them tokens to be able to download it. And after that, once again, we needed to consolidate the metadata. So we first choose Grobit that you might, may, probably know, which is a tool that takes a PDF in entry and give you a standard XMLTI as an output. And all your data is structured in it, just like giving you the author, paragraph, keywords, affiliations, stuff like that. And after that, we used, so we developed two tools called Datastate and Softsite. And those tools tried to detect the mention of data or code in the PDF. So first, the mention. And after that, in the second layer, trying to categorize or qualify the mention if it's a mention of usage, if it's a mention of production, and if it's a mention of sharing. So those two tools have been developed by Patrice Lopez from ScienceMiner, the guy who is behind Grobit 2. And once again, we were able to do some indicators to have some ideas about the part of the, let's say, the part of the use of data in the publications in France. So among the world publications, so the way to read this graphic is among the world publication in France. So in the 2021, 60% of the publication were mentioning the use of data. So it's a little bit just like we need all the words to make sense of the graphics. And this one is the which part of the publications saying they have produced data if they mention the use of data. So there is just like a funeral approach. Just like first, they take the one that shows data and among them, the one that produced the data and then among them, the share of data. So yes. So once again, to read this graphic, we can say that in France. So 22% of the publication that mentioned the sharing of their own data. So that's it. And as a project about open science, we had to be fully open. So all the code is open. So there is a link here on our GitHub. And the whole data set is open. And then we published our methodology somewhere. And even the talk here is open. And so this project started in 2018. But from there, the open world really moved. So Open Alex released the data about the world research in CC0. So if you don't know it, you should have a look there. And the whole set of publications across the world and the institutions and the funders and the authors. And everything is in CC0. It's quite early. And everything is not perfect. But the good point is that exists now. And it's improving from day to day. Thanks to that, people from Coqui in Australia that developed the Open Access dashboard. So it's a website where there is an open access rate for each country in the world. So it's not perfect for France because it's always difficult to detect the affiliation. So it's a good approximation. And finally, last year, the CWTS released the Leiden ranking but in the open edition. So everything is open there, the data that they used. Because they are based on Open Alex and their own methodology. So yes, that's it. Thanks for listening. Thank you. Questions? Yes. I do have a question for you. Can you make up something? Yep. Yes, I have a call. Thank you very much for this. I'm sorry, but I use it a lot to monitor open-size outputs for Wikimedia purpose this month. I have two questions that maybe will answer some of them. The first is about the timing. So how many years after the publication, year, do you think the data is reliable? Sometimes it takes years for a both ways to deposit. The second question was about Open Alex, but we already answered it. The third question was how do you share what's the best way for people to build on this data? For example, do you send new notices to HAL? And the final question was about the topic. Do you contribute upstream? So yes, thank you. I'll try to remember all your questions. So, okay, Grubit, we don't contribute to Grubit directly, but the two other tools that have been developed from, not from scratch, but from other experience from the same guy, so Patrice Lopez. We paid him for the two data sets and soft sites that have been adapted and improved for this project. But everything is open source, so if you need it, you can use it. Second was about, I don't get it, the one with HAL's notice. Like if a publication that you detect is missing from HAL, it's not HAL. No, what we do, so if a publication is missing from HAL, it's not the perimeter here of the project. But as long as, let's say, a university or laboratory asks us for a monitor about their own science, they send us the list of the publication. And in return, because we have the data, we send them the full list of the publication with the metadata that we calculated. And in this metadata, there is the HAL ID if we match it. So they might, if needed, they might have a way to find the part of their publication that are in HAL and the part that is not. But it might be interesting indicators to add it. You're right. And I forget probably one of your questions. The delay, how many years ago? Yeah, you're right. So the delay, so yeah, the delay in publication, it's always a question. So there is a delay because we grab data from multiple services. There is some delay to propagate the truth between those platforms. Plus, there is the delay when we decide to collect this data because the whole process is quite long. So us, we decide to collect the data four times a year. But we publish the new version of the data only once a year. So and it will happen the end of February this year. So we mentioned here that the data are from, so the data displayed here are from 2022, collected from 2022. So that's the reason why we only display public publication until 2021. So we decided, so in 2023, we published the monitor based on data on 2022. And this data and only the publication from 2021 and are taking into consideration. So it's two years. So yes, it's two years. And that, guys. Sorry, I think too long. Yeah, we've run out of time. Yeah, thank you.