[00:00.000 --> 00:15.440]  Hello. Thank you very much for having us here. I'm Daniele Guido, together with my colleague
[00:15.440 --> 00:23.240]  Elisabeth Gerard. We are coming from the University of Luxembourg from the Center of Contemporary
[00:23.240 --> 00:29.840]  Digital History, where we're running this new journal, this new idea of journal, together
[00:29.840 --> 00:36.920]  with a publisher, a well-known publisher in the open access publication, which is The
[00:36.920 --> 00:46.400]  Groiter. So the idea is the journal of digitalhistory.org, and then the idea is how to bring reproducible
[00:46.400 --> 00:53.400]  papers in the humanities, and in digital history in our specific case. And then that's
[00:53.400 --> 00:58.600]  why we join forces with them. So it's a joint venture with them directly, so the team is
[00:58.600 --> 01:04.920]  relatively small compared to other projects, and then we have two perspectives that we
[01:04.920 --> 01:13.560]  decided to put together. On our side, we understand that academic publishing is a bit too traditional,
[01:13.560 --> 01:21.240]  especially in history. And then our researchers, they currently work on Jupyter Notebook to
[01:21.240 --> 01:29.240]  run their own experiment and so on. So the idea was can we pass from experiment on Jupyter
[01:29.240 --> 01:34.080]  Notebook to actual publication also in our domain. And on the other side, they wanted
[01:34.080 --> 01:39.000]  to test out this hypothesis, because they really want to engage with new publication
[01:39.000 --> 01:46.080]  practices, and this joint venture would just a good match.
[01:46.080 --> 01:54.560]  Then, well, reproducible papers in digital history means a lot of things, because first
[01:54.560 --> 02:03.320]  of all, we have now massive digitization process of primary sources and sacred literature
[02:03.320 --> 02:11.000]  and also new digital material, like the Twitter archive that we've been seeing before. On
[02:11.000 --> 02:15.520]  one side, the details of the code are crucial, so sharing the data set is one thing, but
[02:15.520 --> 02:21.560]  the other thing is really how this data has been created, so where the condition of production
[02:21.560 --> 02:29.040]  of this data. So this is very important for us as historians, not for me, for my colleague.
[02:29.040 --> 02:36.120]  And then interpretation, so how the data set has been built, which were the limits. All
[02:36.120 --> 02:42.800]  this question needs to be addressed in a different way. And then at the same time, we have, of
[02:42.800 --> 02:48.320]  course, new standards, not only in digital history, but also the famous fair principle,
[02:48.320 --> 02:57.480]  so findable, accessible, interoperable, and reusable data. And we need to meet this criteria
[02:57.480 --> 03:06.240]  also with our journal. And this idea of Mullen is the one of a braided narrative. So it advocates
[03:06.240 --> 03:14.120]  for bringing together two things. One is the narrative, so the argumentation of our publication.
[03:14.120 --> 03:19.600]  The other one is the interpretation of data, and say that they can be done in a narrative
[03:19.600 --> 03:29.760]  way. This is where we put these so-called multilayers together. So this one is like every article
[03:29.760 --> 03:35.320]  published in our journal has a fingerprint, sort of identity, where this level, like the
[03:35.320 --> 03:42.920]  narrative, the hermeneutic level, and the data layer are together. So this is the representation
[03:42.920 --> 03:49.640]  of one Jupyter notebook, which is normally linear, cell by cell. We just distort it,
[03:49.640 --> 03:55.920]  we put it in a circle, and here you can test it out. So this was also a tool, it is also
[03:55.920 --> 04:03.240]  a tool for our authors, which we own them a lot, because they are our primary tester,
[04:03.240 --> 04:11.880]  and it is still an experimental journal. And you could tweak with data, you can change
[04:11.880 --> 04:19.200]  with the content, and you see how the fingerprint is changing. This was just an experiment at
[04:19.200 --> 04:27.520]  the beginning, but then it really becomes integrated, it is down there, integrated into the main
[04:27.520 --> 04:36.200]  interface of the journal. And we saw that indeed they were very different. They were
[04:36.200 --> 04:43.600]  very different, and we can see also the code style of every Jupyter notebook, how the author
[04:43.600 --> 04:50.920]  decided to narrate the arguments. So I will go quickly, sorry. And then this is like the basic
[04:50.920 --> 04:56.760]  layer, so the narrative layer that looks like an MV viewer with steroids in the sense that we
[04:56.760 --> 05:03.640]  have figures, we have tables, we have bibliography with Zotero and Psy2C. And then above all,
[05:03.880 --> 05:12.040]  it is a very thin layer on top of the Jupyter notebook, because we use the usual output of the
[05:12.040 --> 05:20.640]  notebook, so this is very, yeah, an augmented MV viewer. And then we have, as it is a braided
[05:20.640 --> 05:26.240]  narrative, we decided to have this metaphor of this level one on top of the other. So this is a
[05:26.240 --> 05:31.920]  sort of animation, on the left you see the full hermeneutic layer, and on the other side you
[05:31.960 --> 05:39.280]  can see how it slides through the, like behind the narrative layer. And the data layer is for the
[05:39.280 --> 05:48.440]  moment, the part on top, right top, which we use MyBinder, fantastic service to publish online
[05:48.440 --> 05:56.360]  your notebooks. And we wanted this article not only to be show off of the data set, but also
[05:56.960 --> 06:03.760]  a small history lab, so that people could just click on the button and get to the data and understand
[06:03.760 --> 06:10.640]  how the data has been composed. The good thing is that we decided to keep this MyBinder as this
[06:10.640 --> 06:18.760]  source of truth. So the article that you see published is exactly the same copy, with just a
[06:18.760 --> 06:24.840]  different way of interacting with a different layer. So this is how it looks like on MyBinder,
[06:24.840 --> 06:32.800]  so it's a classical Jupyter notebook, and for every notebook we have a GitHub repo where we
[06:32.800 --> 06:40.440]  store all the requirements and all the images in the data set. We have to put together the
[06:40.440 --> 06:49.920]  fair metadata, but still, so it's under construction. Then what does it mean having Jupyter
[06:49.920 --> 06:56.800]  notebooks for publishing? We see that in the literature there are a lot of critics who shouldn't
[06:56.800 --> 07:03.320]  use Jupyter notebooks because it's too complex, it's impossible to replicate and so on and so forth.
[07:03.320 --> 07:11.760]  But then for us, it was really the simplest solution. So at the same time, to publish with
[07:11.760 --> 07:17.160]  Jupyter, we had to make our pipeline a bit more complex than usual. So we have a first review
[07:17.240 --> 07:23.280]  directly on the abstract, where we start communicating with the authors, understanding
[07:23.280 --> 07:30.000]  their needs, creating a writing environment for them that can be replicated with Python,
[07:30.000 --> 07:36.080]  sorry, with Docker containers for Python and Air. And then there's the first technical review,
[07:36.080 --> 07:40.960]  she's in charge of the first technical review, which is the most complicated one because there's
[07:41.440 --> 07:48.800]  a lot of checks. We saw some projects already, we needed to have checks. And then we have a lot
[07:48.800 --> 07:53.520]  of other open source software that enters this pipeline, like for the preview of the notebook
[07:53.520 --> 08:00.080]  with the GitHub app, MB Viewer, we have MyBinder, and this is just for the first technical review
[08:00.080 --> 08:05.880]  because then the article is being sent to the reviewer for the double banana review. So before
[08:05.960 --> 08:12.840]  even reviewing, we had to do this huge job because they have to review also the data and the pertinence
[08:12.840 --> 08:20.160]  of the dataset. And then finally, there is one important thing, it's English editing. So how
[08:20.160 --> 08:26.560]  to edit something that which is already being run, so without running itself. So this could be a
[08:26.560 --> 08:32.200]  tool for translators, tool for correctors that they're not into the Jupyter world. So how to do
[08:32.200 --> 08:38.840]  that? We have Jupyter text, we're still testing some plug-in to see if this could work without
[08:38.840 --> 08:44.840]  touching the final output. And then the final technical review, so after all this has been
[08:44.840 --> 08:50.440]  shipped, we have a DOI. So the article is now published and needs to be indexing and there is
[08:50.440 --> 08:56.840]  the problem of long-term archiving, which is a big problem for many reasons. First of all,
[08:56.920 --> 09:06.600]  like the libraries that get deprecated, also API that disappeared. So how to really reproduce
[09:06.600 --> 09:11.800]  this in the future? And then finally, the dataset needs to be included into, we have
[09:11.800 --> 09:18.360]  Dataverse, but we're looking for Zenodo in order to match the fair metadata. And time is up,
[09:18.360 --> 09:25.160]  I have a question for you, of course. Thank you very much, first of all. And then if you have
[09:25.160 --> 09:30.280]  want to contact us, just collaborate or work together on Jupyter publication,
[09:30.920 --> 09:36.840]  JDH admin at uni.lu. And then the question is, how can we actually collaborate on something
[09:36.840 --> 09:44.680]  which is a notebook that requires quite a threshold of expertise, not only for the researcher,
[09:44.680 --> 09:51.000]  but for the people that are around, and how to maintain all this and how to make this history
[09:51.000 --> 09:54.040]  love living for more than one year. Thank you.
[10:21.080 --> 10:30.600]  Yeah, well, I repeat the question. So he asked me if the double blind review, how can we keep it
[10:30.600 --> 10:38.600]  actually a real natural double blind? So she anonymized the data on GitHub. So we have
[10:38.600 --> 10:43.320]  specific repositories that have been created after the communication with the authors,
[10:43.320 --> 10:50.120]  where we only have the code without the names, but then you still have the bibliography,
[10:50.200 --> 10:56.120]  so it's easy to, it's a very small word, one of the digital history. But this is the way to
[10:56.120 --> 11:02.600]  maintain double blind. And then we're going to send the review where both the MyBinder and the
[11:02.600 --> 11:11.960]  version of the article on our website with a hidden URL. So this is the only thing that we can do.
[11:12.760 --> 11:19.080]  For sure, the double blind, we have the problem that we cannot really use the
[11:19.080 --> 11:24.520]  pull request directly on the GitHub repository. So in fact, there is some replication between
[11:24.520 --> 11:31.240]  the GitHub repository. Sometimes after with the peer review, there is some
[11:32.280 --> 11:39.160]  requisite that he come back to a technical review because there is a revision. So there
[11:39.160 --> 11:44.920]  is this question about how we re-synchronize the notebook together. There is some authors that
[11:48.920 --> 11:57.000]  they have good enough with GitHub, but to review a notebook with the output with the metadata
[11:57.000 --> 12:00.840]  to track what has been changed. That's why, yes, this it was,
[12:01.560 --> 12:09.560]  but yes, the questions that you have, we are testing with review and be or not also to maybe
[12:10.680 --> 12:18.040]  use some markdown or just Python script to produce several output in order to not
[12:19.080 --> 12:24.120]  sometimes touch about this metadata that they are inside the notebook.
[12:24.120 --> 12:31.960]  And there was another question, but I don't know if we have time. Yes. Yes. Yes. Please.
[12:31.960 --> 12:33.960]  Last one. Last one.
[12:33.960 --> 12:37.960]  Of the SS brightness of your data sets.
[12:37.960 --> 12:42.360]  Sorry, how to assess? Of the SS brightness of your data sets.
[12:42.360 --> 12:49.160]  Yeah, that's the very big, big, big question. So the idea behind the braided narrative is then you
[12:49.160 --> 12:56.280]  tell the story around the data on one side and on the other side you keep the data like with the
[12:56.280 --> 13:05.400]  Zenodo metadata coherent or probably with what Paul showed us before with Ricardo. So having
[13:06.520 --> 13:12.280]  like an external check on the metadata and on the data set itself. At the same time,
[13:12.280 --> 13:18.360]  the initial, the first technical review is the one where we assess actually the data. So if the
[13:18.360 --> 13:24.760]  data set are complete, coherent, we don't judge them because then we know that there are conditions
[13:24.760 --> 13:30.280]  of production. That needs to be, we try to make this as more explicit as possible.
[13:30.280 --> 13:32.280]  That's.
[13:33.240 --> 13:38.920]  Yes, exactly. And this, like that's why the long-term maintenance. So now we only have
[13:38.920 --> 13:46.360]  nine articles, but we have 28 in the pipeline in the coming year. So it's really now it's
[13:46.360 --> 13:51.960]  getting us up speed and we have more and more interaction with others which makes things more
[13:51.960 --> 13:53.960]  complicated. Thank you.