[00:00.000 --> 00:16.360]  All right, so I'll start a little bit earlier of the, of the, actually the 2 p.m.
[00:16.360 --> 00:20.200]  Just the thing that, so I'm wearing this t-shirt because I'm one of the day room manager here
[00:20.200 --> 00:25.640]  in this, this room and I'm taking over actually a talk slots that has been cancelled.
[00:25.640 --> 00:33.040]  So we were supposed to, to, to hear a talk by Maria Arias de Reina, who couldn't make it today,
[00:33.040 --> 00:39.600]  unfortunately. She was supposed to talk about data flowing in the right way, which is a talk
[00:39.600 --> 00:47.960]  about a tool called Cauto, which implemented data workflows with a low-code, no-code approach.
[00:47.960 --> 00:52.840]  This is, this is what it looked like. So, of course, I can't talk about this tool because I,
[00:52.840 --> 01:00.520]  I don't know it. It actually looks pretty cool. So we are very sorry for Maria not being here
[01:00.520 --> 01:13.000]  and we hope we can host her next time. So I will speak about a project, a research project
[01:13.000 --> 01:26.040]  called Ricardo in the digital mentees, which I've done with, oops, sorry, how are we going? Yes,
[01:26.040 --> 01:32.840]  I worked with Beatrice de Dinger from Sciences Po, Centre d'Histoire, so she is historian and I am
[01:32.840 --> 01:38.640]  Paul Girard. I am from a company called Westware, a small agency specialized into developing data,
[01:38.640 --> 01:46.080]  data-oriented web applications, and we do work a lot with researchers. Today I'm here to talk
[01:46.080 --> 01:51.960]  about how actually a collaboration between a researcher, Beatrice, and a data engineer myself
[01:51.960 --> 01:59.920]  can be fostered by using data control systems. By data control systems, I mean making sure we
[01:59.920 --> 02:06.600]  care about the data we are using in the research, by documentations, version control, and also quality
[02:06.600 --> 02:17.360]  control. So the research is about the history of trade. So we, together with Beatrice, we build
[02:17.360 --> 02:24.800]  a database of trade flows between nations, well, between geopolitical entities in the world,
[02:24.800 --> 02:34.520]  in the 19th century, which means that we have the main data, as we know, how many, how much amount
[02:34.520 --> 02:40.720]  of money in different currencies has been exchanged between different geopolitical entities in the
[02:40.720 --> 02:47.240]  19th century. We know important exports, and we know this with a bilateral view, which means that
[02:47.240 --> 02:54.000]  we know the trade from France to UK, for instance, and the rivers too, from two different sources,
[02:54.000 --> 03:01.120]  which makes quite a nightmare to deal with, but still. So this is basically the main publication
[03:01.120 --> 03:11.960]  we already achieved. So we started by releasing in 2016-17 a web exploration tool, I'll show you,
[03:11.960 --> 03:21.120]  and also a paper about how we build this database. And then in 2021, we released a new database
[03:21.120 --> 03:28.560]  called Geopol East, which is basically a data set that explained trying to track which geopolitical
[03:28.560 --> 03:34.240]  entity, so I'm using this weird word, because we have countries, of course, but we also have
[03:34.240 --> 03:40.080]  entities that are part of countries, but we also have trade with entities that were colonies at
[03:40.080 --> 03:45.720]  that time, and all those kind of weird political statuses. And because of that, we built this Geopol
[03:45.720 --> 03:51.440]  East database where we tried to track which geopolitical entities were controlled by which
[03:51.440 --> 04:02.720]  other one in time. And recently, we released a new version of the database, adding 230,000 new
[04:02.720 --> 04:09.360]  trade flows. And this releasing of new data, because actually Beatrice discovered new archives,
[04:09.360 --> 04:18.640]  new archival book about trade, this massive updates needed a tool to make sure we can
[04:18.640 --> 04:24.640]  actually release the data, which are cleaned the structures the way we want it, without having to
[04:24.640 --> 04:30.480]  deal with all those kind of issues manually. I will speak about that a little bit later. So this
[04:30.480 --> 04:36.360]  is what the website, the main website looks like. It's a web application you can go where you can
[04:36.360 --> 04:43.000]  explore basically the trade of the world in the 19th century. So we have different kind of
[04:43.000 --> 04:48.160]  civilizations. I will not go through all of them because I don't want to focus too much on this
[04:48.160 --> 04:55.560]  today. Well, if you have questions about this, we can go back to that later. We also have this
[04:55.560 --> 05:03.440]  website Geopol East that allows you to actually explore the political evolutions of the links,
[05:03.440 --> 05:13.040]  sovereignty links between the different entities. I'll show you a little bit what it's like just
[05:13.040 --> 05:22.400]  afterwards, I think. So just to be totally honest, this slide deck is actually something I already
[05:22.400 --> 05:30.840]  presented in another conference. So I wanted to speak about the visual data exploration tool we
[05:30.840 --> 05:35.760]  made, the frictionless data integration. So this is the main point I want to speak about today,
[05:35.760 --> 05:43.840]  point two. And also the third point was how we can actually analyze heterogeneous data in the
[05:43.840 --> 05:51.160]  long run, like one century of data. My main, I will try to convince you that actually using data
[05:51.160 --> 05:58.560]  integration is a very nice and important tool to foster this long-lasting collaboration we had
[05:58.560 --> 06:05.280]  between Beatrice, historian, and I, data engineer. So about collaboration, I just put a link to a
[06:05.280 --> 06:11.920]  conference I gave a few years ago about this specific subject. So visual data exploration,
[06:11.920 --> 06:20.760]  so I will really go quickly on this part to focus more on the second part. Our main objective here
[06:20.760 --> 06:26.920]  is to propose basically a tool, a set of interactive data visualizations on the web,
[06:26.920 --> 06:35.360]  that all those researchers are basically people exploring this, to change points of view on
[06:35.360 --> 06:43.160]  the data, looking at, for instance, the total trade of the world, then focusing on one specific
[06:43.160 --> 06:51.160]  country, then on one specific currency, and to be able to add all these different ways to look
[06:51.160 --> 06:57.400]  at the data in the same tool. We also like to offer visual documentation, like visualization is
[06:57.400 --> 07:04.840]  a very nice important tool to spot issues or surprises or errors in the data, and to unfold
[07:04.840 --> 07:12.640]  the complexity of the old phenomenon. So this is, for instance, the world view. So we are able
[07:12.640 --> 07:20.000]  to retrace the world trade in a century, but as you can see, there is more than one curve. So we
[07:20.000 --> 07:26.160]  have different ways we can calculate that actually from the data. We can, for instance, take some
[07:26.160 --> 07:33.680]  researchers that really did re-estimations of this total trade by correcting sources and all
[07:33.680 --> 07:38.960]  this kind of stuff. So that's one way to do it. That's the green curve in this visualization,
[07:38.960 --> 07:44.640]  but we also can, we can actually sum all the totals we have in our data. This is the yellow one,
[07:44.640 --> 07:51.720]  and we also have the, so here we are summing all the flows we have, the yellow, and the red is
[07:51.720 --> 07:56.280]  more, we are summing only the total that were in the archive books, and it's not the same thing.
[07:56.280 --> 08:02.000]  If you sum what we have, or if you take the sum that's done at the time, you don't have the same
[08:02.000 --> 08:10.840]  results. Welcome to the nightmare of dealing with archive data. In this visualization, for instance,
[08:10.840 --> 08:18.040]  we are focusing on one country, Denmark, and then we can actually spot the trade on the
[08:18.040 --> 08:25.680]  long run of this specific country, and we can also visualize, so here is Germany's on the
[08:25.680 --> 08:34.760]  Rhine, we can also depict actually not only the total trade, but also the difference trade,
[08:34.760 --> 08:41.960]  bilateral trade between Germany and his trade partner along time. Okay, so this is an objective.
[08:41.960 --> 08:47.160]  So geopolitists here, for instance, we see that when we talk about Germany's on the Rhine,
[08:47.160 --> 08:54.360]  what are we talking about? And we are talking about a geopolitical entity that had a different
[08:54.360 --> 09:00.280]  set of seasons on time, you can see here, and then you can see on the bottom line, which all the
[09:00.280 --> 09:08.880]  parts, all other geopolitical entities were actually part of Germany through time. Because
[09:08.880 --> 09:18.280]  sometimes we have trade with only Saxony or Waldeck, and we want to know eventually if those
[09:18.280 --> 09:27.560]  entities are part of another one. So frictionless data integration. So we are using data package
[09:27.560 --> 09:35.280]  from frictionless data here from Open Knowledge Foundation. So actually, there is a talk from
[09:35.280 --> 09:42.720]  Evergony from frictionless team later today in the online part of our room. We'll talk about the
[09:42.720 --> 09:48.560]  new tool fostering data package. And actually, I'm very interested into that. But I will talk
[09:48.560 --> 09:57.440]  about what I've done myself. So about this project, the main thing we do is
[09:57.440 --> 10:05.320]  versioning the data. So we put data as CSVs into a version control system like Git, simply. Here
[10:05.320 --> 10:10.920]  it's on GitHub. And you can see that you can track actually just the same way we do with code,
[10:10.920 --> 10:17.240]  who changed which data when and why. So here, for instance, is Beatrice, who actually corrected
[10:17.240 --> 10:24.240]  the typo in the flow number, adding the comma at the right place. And we have the commentary here.
[10:24.240 --> 10:30.920]  This is very important to keep track of what's going on because we have like hundreds and even
[10:30.920 --> 10:37.680]  thousands of files like that. So it's very important to have that also to know if we have issues,
[10:37.680 --> 10:45.320]  if that happens, where it comes from. So data package. So data package is a formalism where it's
[10:45.320 --> 10:52.120]  basically a JSON file where you will actually describe the data you have, adding a documentation.
[10:52.120 --> 10:56.840]  So the first interest of using data package is actually to document your data set to make it
[10:56.840 --> 11:02.840]  easier for other people to actually understand what you want it to do. And it's very important for
[11:02.840 --> 11:08.840]  publication at the end, open science. So here we have the title of the project. We have the
[11:08.840 --> 11:16.680]  license, the contributors. That's also very important to have. And then we describe resources.
[11:16.680 --> 11:22.360]  Resources can be seen as a data table. So here, for instance, we have rec entities. And for each
[11:22.360 --> 11:28.200]  resources, which is a CSV here, we describe the fields we have in the table. So we know that
[11:28.200 --> 11:35.720]  the rec name table is a string, it's unique, and it is required. So it's very like a relational
[11:35.720 --> 11:42.600]  database schema. It's very kind of the same spirit, but in a JSON format. The reason why you can do
[11:42.600 --> 11:49.000]  that, as I said, is documenting. The second reason is actually to control your data. So doing
[11:49.000 --> 11:55.960]  right, driving data validation. So if you have a data package described like that, you can then
[11:55.960 --> 12:02.680]  use a Python library, frictionless, it's called frictionless, which will actually check all your
[12:02.680 --> 12:09.400]  data line, if each data line you have, respect the schema you wrote. And if it's not the case,
[12:09.400 --> 12:16.360]  it will actually provide you with reports, with errors like, for instance, here, I have a foreign
[12:16.360 --> 12:23.080]  key error because the modified currency year is not known in the table that is supposed to
[12:23.080 --> 12:29.160]  have this data. So it's a very nice way to actually, we get new data, and then we check, okay,
[12:29.160 --> 12:35.560]  where do we stand about what we want to achieve at the end, which is to respect the data package
[12:35.560 --> 12:44.040]  formalism we wrote. So that's very nice. It's very cool for data engineers. But as I said,
[12:46.360 --> 12:51.560]  our goal with Beatrice is that we work together. And she, because the thing is like,
[12:52.680 --> 12:59.800]  when she enters new data in the system, she has has an historian to take decisions on how to
[12:59.800 --> 13:04.920]  interpret the data that were in the archive. I can't. That's not my job. That's not my
[13:04.920 --> 13:10.440]  responsibility. I don't have the skills to do it. So we need something to allow her to actually
[13:11.080 --> 13:17.880]  correct the data, update the data that comes in into the data package format. And she can't use
[13:18.520 --> 13:23.480]  common line tools in Python script and that kind of stuff. So we need something, we missed a tool
[13:23.480 --> 13:29.880]  here to let actually humanist researchers, in this case, but people in general, to be able to
[13:29.880 --> 13:35.720]  interact with the data flow with something else than actually two technical interface.
[13:36.520 --> 13:43.320]  So we built, we developed a specific web application that actually helps Beatrice to
[13:43.320 --> 13:50.280]  integrate new data by using the data package as a validation system. So all of this is done in
[13:50.280 --> 13:56.520]  JavaScript. You also have a JavaScript library for data package. So basically this is the steps.
[13:56.520 --> 14:03.400]  So the idea is Beatrice will upload a data spreadsheet, so a new data file, transcription
[14:03.400 --> 14:09.560]  of one new archival book she found. The tool which first checks the spreadsheet format,
[14:09.560 --> 14:14.040]  saying like, do we have all the columns we want and everything. If it's correct, then it will go
[14:14.040 --> 14:25.240]  through all the data points, checking all the errors and grouping them to make, to propose a
[14:25.240 --> 14:33.800]  creation interface for her to correct all those issues through a form. And we tried to do, to
[14:33.800 --> 14:41.640]  develop something that makes this process, which is tedious and a long process, as easy and as
[14:41.640 --> 14:50.040]  fast as we could for her Beatrice to actually go through this. At the very end, this tool will
[14:50.040 --> 14:56.840]  actually commit to and push, commit it into a Git repository and push it into the GitHub repository.
[14:56.840 --> 15:02.920]  All of that done into a serverless web application, which means that I didn't have, had to
[15:04.120 --> 15:08.360]  introduce a Git command line to Beatrice neither. The tool does that for her.
[15:09.960 --> 15:14.600]  So this is what it looks like. So it's a React web application. Here we have
[15:14.600 --> 15:21.720]  the Schema validation summary where we see for the fields. So the different fields we have,
[15:21.720 --> 15:25.880]  for which we have errors, the kind of the error we have. And at the end, we have the
[15:25.880 --> 15:33.720]  error overview, which says how many rows that has an issue. For instance here, in the source
[15:33.720 --> 15:41.800]  column, we have two different invalid values that impact 169 rows. The idea here is to correct
[15:41.800 --> 15:50.680]  all this group of 1 and 169 rows with only one edit. So once we have all those errors,
[15:50.680 --> 15:57.800]  basically the process of workflow using this tool will be to go through the error groups
[15:57.800 --> 16:04.040]  one by one. The web application will actually generate a form with inputs
[16:04.760 --> 16:10.840]  to actually help Beatrice to decide. So in this example, we have a value for a partner. So partner
[16:10.840 --> 16:18.360]  is a trade partner. So it's a geopolitical entity. Here it's in French. It's il de selon. So we use
[16:19.560 --> 16:23.880]  English-based vocabularies to translate the partner. So we need to decide what is il de
[16:23.880 --> 16:29.960]  selon in our vocabulary in the rest of the data. And this is where we have a search input
[16:29.960 --> 16:35.560]  where actually Beatrice can actually search for selon, which is called in our vocabulary
[16:35.560 --> 16:43.160]  Sri Lanka Selon in parenthesis. And once she chose that, actually the tool we correct
[16:43.160 --> 16:50.280]  this column and put the data at the right place to make sure we will translate il de selon to the
[16:50.280 --> 17:00.200]  Sri Lanka Selon. At the end, once she went through all the process, we have somebody here explaining
[17:00.200 --> 17:08.120]  like all the corrections she made. So in the first here line, for instance, a year was misspelled.
[17:09.880 --> 17:17.000]  All that kind of thing, we change the source name and everything is fixed. So once all the errors
[17:17.000 --> 17:23.800]  have been corrected through the form, the data form I just showed you, then she can move on with
[17:23.800 --> 17:34.120]  the last step, which is actually to publish this new data that is now valid because we
[17:34.120 --> 17:38.440]  know it's valid because we can control it with the data package into the GitHub repository.
[17:40.760 --> 17:48.120]  And this is how basically the React web application will really prepare the data.
[17:48.120 --> 17:58.280]  So I could go into details into what does that mean later and make it possible for Beatrice to
[17:58.280 --> 18:07.080]  actually take the right decisions to adapt the raw data into the data package we worked with.
[18:09.240 --> 18:15.800]  So I have a little bit more time. So this is the analysis. Maybe I can try to deem
[18:15.800 --> 18:27.240]  all a little bit of tool live. Okay. So the very important thing is like it's a
[18:27.240 --> 18:32.760]  several less web applications. So here it's on my local host on my laptop,
[18:32.760 --> 18:40.200]  but actually it's hosted on GitHub directly. So what is the media lab? Actually, a lot of
[18:40.200 --> 18:47.080]  this work has been done at Sciences for Media Lab, my previous employer. So congrats to Zen2
[18:47.080 --> 18:59.960]  because they contributed to that work too a lot. So this is a tool. It's hosted on
[18:59.960 --> 19:05.080]  GitHub.io because it doesn't need any server. All the logging process with GitHub is done
[19:05.080 --> 19:10.760]  through a personal token, which is a very specific key that you have to produce in your
[19:10.760 --> 19:17.560]  GitHub account for once. Then you use that as a login mechanism. So this is what it looks like.
[19:19.320 --> 19:27.880]  Once I am logged in, I can fetch the data from GitHub to make sure I have the last version
[19:27.880 --> 19:35.480]  of the data before adding new things. Then here I can prepare the little file here, which normally
[19:35.480 --> 19:41.720]  should have some errors. So the first thing here you see like this green message here on the bottom
[19:41.720 --> 19:49.720]  says that actually this CSE file is valid structure-wise when the errors of the columns are good,
[19:49.720 --> 20:00.520]  which is a good first step. And then this is all the errors I have in my file. This is a nice
[20:00.520 --> 20:05.960]  step because you want to overview what kind of mess you are going to on WR before starting the
[20:05.960 --> 20:10.520]  process because if you have too many, maybe you want to do that later or asking for help.
[20:11.080 --> 20:17.640]  So once you've seen that, you can start. So this is basically all the thing we have to do.
[20:17.640 --> 20:24.920]  So this is the first one. I can move to the next error or go back, even though I haven't corrected
[20:24.920 --> 20:35.640]  it yet. And here I say like, okay, so the value, I don't know, whatever. This character is not
[20:35.640 --> 20:41.640]  actually a unit because the unit should be an integer. Yeah, it's true. So it's better with the
[20:41.640 --> 20:52.520]  one. And I can confirm this fix. And now we're good. Unit is one. Now I move to the next one.
[20:52.520 --> 20:58.440]  You see here I am in two on nine. So all the process is also trying to make that as smooth as
[20:58.440 --> 21:05.160]  we can. So as soon as I fix it, so here I have, it's written in French, it's 1938. But actually
[21:05.160 --> 21:11.960]  we want that to be an integer again. I don't want the later version of the year. So we understand
[21:11.960 --> 21:19.080]  how it tweets as a number. As soon as I confirm the fix, I move to the next page. So that we can
[21:19.080 --> 21:24.440]  try to make that process as seamless as possible. So here I have a source. So this is the foreign
[21:24.440 --> 21:31.640]  key. So in the data table, the source is actually, it's a key that is referring, which is referring
[21:31.640 --> 21:37.160]  to the source table. And say like, so here basically foreign key source violation. So it means that
[21:37.160 --> 21:45.320]  this sort doesn't exist in our system. So here I have two choice. Or I was supposed to, okay,
[21:45.320 --> 21:55.800]  normally I should, I should, oh yeah, sorry. So normally, okay, so whatever. So I can search for
[21:55.800 --> 22:04.120]  it and find it. Or, and in which case it will, the edit will be only replacing the key. Or I can
[22:04.120 --> 22:09.000]  create a new item. And here you can see that here I'm creating a new source because sometimes the
[22:09.000 --> 22:13.640]  source doesn't exist yet. So I have to go through all the, you see this form is much, much longer
[22:13.640 --> 22:19.320]  because here I'm creating a new line into the source table. I will not do that because it's too
[22:19.320 --> 22:26.360]  long. I will just give me something, please. And that will make it, okay. And so on and so forth.
[22:26.360 --> 22:34.600]  Again, we have an issue with the, sorry. This example is a little bit up. Okay. Here it's a
[22:34.600 --> 22:41.080]  Trinidad and Tobago. It's a geographical line time. I don't know because it's a A. Trinidad and
[22:41.080 --> 22:52.840]  Tobago, not A. Okay. And we're good. Australia with a lot of E at the end is not correct.
[22:54.440 --> 23:05.640]  It's Australia. Yep. Sorry, it's very long. Yeah, whatever.
[23:05.640 --> 23:15.160]  A dollar. And let's say it's a scrap. Don't do that, right? But, okay.
[23:22.120 --> 23:29.000]  Okay. So you see, that's the important point is like, so we are based on the data, in the data
[23:29.000 --> 23:35.880]  package. So we are using for NKIS specification and so on. But actually, we had to add specific
[23:35.880 --> 23:40.760]  forms for our case. So the application here is not generic. You can just put a new data
[23:40.760 --> 23:45.800]  package of yourself with your data and it will not work because we had actually for UX and UI
[23:45.800 --> 23:53.800]  reasons to make specific cases like that where the forms are not exactly as the data package
[23:53.800 --> 23:59.400]  described it. It was too complex to make it very generic. But actually, with more work,
[23:59.400 --> 24:04.520]  that could be achieved maybe at some point. And actually, the talk from Evgeny this afternoon
[24:04.520 --> 24:11.880]  will talk a little bit about that kind of stuff. So here we are. I'll stop the demo here.
[24:14.280 --> 24:21.320]  Just to finish, why do we do all that kind of stuff? Because we want to analyze trade in the
[24:21.320 --> 24:26.760]  long run. We have lots of trade values as you can see. A lot of trading entities, very too much.
[24:26.760 --> 24:32.520]  And then at the end, we try to, or this is a visual documentation where we depict the kind
[24:32.520 --> 24:40.040]  of different source we use in the database. And at the end, we try to do something like
[24:40.040 --> 24:53.000]  that where actually here we have the trade of the world in 1890. So each nodes here, circles,
[24:53.000 --> 24:59.480]  are geopolitical entities. And the links between them are the trade of that year. So it's total
[24:59.480 --> 25:06.360]  trade. We could choose import or export here. I just summed it up. The important part here is like
[25:06.360 --> 25:12.760]  the color here is based on the type of geopolitical entity we have. So in this
[25:13.960 --> 25:20.440]  orange or kind of yellow thing, it's what we call the GPH entity. It's entities that geopolitical
[25:20.440 --> 25:27.800]  entity we know, mainly countries but not always. In green, those are colonial areas. So it's not
[25:27.800 --> 25:35.800]  a colony. It's not a country which is a colony. It's like French Africa. It's like we know it's
[25:35.800 --> 25:41.080]  colony. We don't know which one exactly. Like here, for instance, we have European Russia,
[25:41.080 --> 25:46.280]  which is a city part of. It's from Russia, but it's the European part of it. And this is what
[25:46.280 --> 25:51.640]  we find in the archive books. So we can't really decide what that means exactly. And we're trying
[25:51.640 --> 25:58.680]  to analyze this kind of, so we have this very, this gap between very heterogeneous data, very
[25:58.680 --> 26:05.240]  difficult to interpret, but still try to do a quantification and analysis like this network
[26:05.240 --> 26:15.160]  on top of this very complex and rich data set. I think I'm good with what I wanted to share with
[26:15.160 --> 26:30.600]  you today. We can move to question if you have. Yes.
[26:30.600 --> 26:38.360]  There was a slide. The slide had a title which was development of a specific web application
[26:38.360 --> 26:46.520]  to integrate new data. That one. Yes. You might not have time to tell me, but please tell me
[26:46.520 --> 26:52.520]  if you have time at some point. What was the conversation with your historian? How did this
[26:52.520 --> 27:00.680]  happen? What did you actually do to plan? Yeah. So the question is like how Beatrice and I ended
[27:00.680 --> 27:06.440]  up deciding to do that. So basically, the whole point is very like the collaboration, because
[27:06.440 --> 27:12.520]  we worked without that for a very long time. And the process was we had to meet in the same room.
[27:13.640 --> 27:19.240]  I was doing the script, checking the data, editing the data, because editing the data in a spreadsheet
[27:19.240 --> 27:26.440]  that doesn't mess up your numbers and everything is not easy, actually. And we were working together
[27:26.440 --> 27:32.760]  on that. It was necessary and actually very nice to do, because we had to exchange. So she was
[27:32.760 --> 27:38.200]  explaining me why she was taking this decision or not, and she was taking this, and I was just
[27:38.200 --> 27:45.480]  putting data. But at some point, we ended up with the fact that we had so much more to add
[27:45.480 --> 27:51.160]  that this process couldn't scale, basically. So we had to find something else to make sure
[27:51.160 --> 27:58.280]  she could do this process on her own. And I would intervene once the data is in the GitHub
[27:58.280 --> 28:04.360]  repository, checking myself with quantification and script and everything again, because you
[28:04.360 --> 28:12.200]  still always need to check everything many times. And then it makes the whole process much more
[28:12.200 --> 28:15.560]  smoother. Yes?
[28:15.560 --> 28:45.400]  I have a question about this slide, too. So would there be any benefit in you committing it to GitHub at the very top of this process? Would there be any benefit to you committing the data set into GitHub at the top of this process after you upload it so that you can compare?
[28:45.400 --> 29:12.840]  Yeah. So the question is like, would it be beneficial or possible to actually commit the data before checking it and put it into GitHub? So yes and no. The reason why we don't do that is because the first one, because I need batteries to take the decision of documenting the raw data to make it compatible with all the nice visualization I showed you.
[29:12.840 --> 29:31.880]  And she needs to take those decisions. She needs to do it. So that's why we put this data into the GitHub after she has done this work of data creation. We could actually host the data as a raw file first and then do that later on the kind of stuff.
[29:31.880 --> 29:46.840]  Still, we still need a web interface that lets Beatrice, the historian, take the decision. So no. Any more? Yeah?
[29:46.840 --> 30:11.800]  Yes. So this tool I'm using here is actually brand new. It's a Gefi, but on the web. We are working on this with my company, Westware, and we are very close to release it. It's basically the same thing as Gefi, but lighter version and a web version.
[30:11.800 --> 30:18.760]  It's already there, but you shouldn't go because it's not live yet.