[00:00.000 --> 00:15.440] Hello. Thank you very much for having us here. I'm Daniele Guido, together with my colleague [00:15.440 --> 00:23.240] Elisabeth Gerard. We are coming from the University of Luxembourg from the Center of Contemporary [00:23.240 --> 00:29.840] Digital History, where we're running this new journal, this new idea of journal, together [00:29.840 --> 00:36.920] with a publisher, a well-known publisher in the open access publication, which is The [00:36.920 --> 00:46.400] Groiter. So the idea is the journal of digitalhistory.org, and then the idea is how to bring reproducible [00:46.400 --> 00:53.400] papers in the humanities, and in digital history in our specific case. And then that's [00:53.400 --> 00:58.600] why we join forces with them. So it's a joint venture with them directly, so the team is [00:58.600 --> 01:04.920] relatively small compared to other projects, and then we have two perspectives that we [01:04.920 --> 01:13.560] decided to put together. On our side, we understand that academic publishing is a bit too traditional, [01:13.560 --> 01:21.240] especially in history. And then our researchers, they currently work on Jupyter Notebook to [01:21.240 --> 01:29.240] run their own experiment and so on. So the idea was can we pass from experiment on Jupyter [01:29.240 --> 01:34.080] Notebook to actual publication also in our domain. And on the other side, they wanted [01:34.080 --> 01:39.000] to test out this hypothesis, because they really want to engage with new publication [01:39.000 --> 01:46.080] practices, and this joint venture would just a good match. [01:46.080 --> 01:54.560] Then, well, reproducible papers in digital history means a lot of things, because first [01:54.560 --> 02:03.320] of all, we have now massive digitization process of primary sources and sacred literature [02:03.320 --> 02:11.000] and also new digital material, like the Twitter archive that we've been seeing before. On [02:11.000 --> 02:15.520] one side, the details of the code are crucial, so sharing the data set is one thing, but [02:15.520 --> 02:21.560] the other thing is really how this data has been created, so where the condition of production [02:21.560 --> 02:29.040] of this data. So this is very important for us as historians, not for me, for my colleague. [02:29.040 --> 02:36.120] And then interpretation, so how the data set has been built, which were the limits. All [02:36.120 --> 02:42.800] this question needs to be addressed in a different way. And then at the same time, we have, of [02:42.800 --> 02:48.320] course, new standards, not only in digital history, but also the famous fair principle, [02:48.320 --> 02:57.480] so findable, accessible, interoperable, and reusable data. And we need to meet this criteria [02:57.480 --> 03:06.240] also with our journal. And this idea of Mullen is the one of a braided narrative. So it advocates [03:06.240 --> 03:14.120] for bringing together two things. One is the narrative, so the argumentation of our publication. [03:14.120 --> 03:19.600] The other one is the interpretation of data, and say that they can be done in a narrative [03:19.600 --> 03:29.760] way. This is where we put these so-called multilayers together. So this one is like every article [03:29.760 --> 03:35.320] published in our journal has a fingerprint, sort of identity, where this level, like the [03:35.320 --> 03:42.920] narrative, the hermeneutic level, and the data layer are together. So this is the representation [03:42.920 --> 03:49.640] of one Jupyter notebook, which is normally linear, cell by cell. We just distort it, [03:49.640 --> 03:55.920] we put it in a circle, and here you can test it out. So this was also a tool, it is also [03:55.920 --> 04:03.240] a tool for our authors, which we own them a lot, because they are our primary tester, [04:03.240 --> 04:11.880] and it is still an experimental journal. And you could tweak with data, you can change [04:11.880 --> 04:19.200] with the content, and you see how the fingerprint is changing. This was just an experiment at [04:19.200 --> 04:27.520] the beginning, but then it really becomes integrated, it is down there, integrated into the main [04:27.520 --> 04:36.200] interface of the journal. And we saw that indeed they were very different. They were [04:36.200 --> 04:43.600] very different, and we can see also the code style of every Jupyter notebook, how the author [04:43.600 --> 04:50.920] decided to narrate the arguments. So I will go quickly, sorry. And then this is like the basic [04:50.920 --> 04:56.760] layer, so the narrative layer that looks like an MV viewer with steroids in the sense that we [04:56.760 --> 05:03.640] have figures, we have tables, we have bibliography with Zotero and Psy2C. And then above all, [05:03.880 --> 05:12.040] it is a very thin layer on top of the Jupyter notebook, because we use the usual output of the [05:12.040 --> 05:20.640] notebook, so this is very, yeah, an augmented MV viewer. And then we have, as it is a braided [05:20.640 --> 05:26.240] narrative, we decided to have this metaphor of this level one on top of the other. So this is a [05:26.240 --> 05:31.920] sort of animation, on the left you see the full hermeneutic layer, and on the other side you [05:31.960 --> 05:39.280] can see how it slides through the, like behind the narrative layer. And the data layer is for the [05:39.280 --> 05:48.440] moment, the part on top, right top, which we use MyBinder, fantastic service to publish online [05:48.440 --> 05:56.360] your notebooks. And we wanted this article not only to be show off of the data set, but also [05:56.960 --> 06:03.760] a small history lab, so that people could just click on the button and get to the data and understand [06:03.760 --> 06:10.640] how the data has been composed. The good thing is that we decided to keep this MyBinder as this [06:10.640 --> 06:18.760] source of truth. So the article that you see published is exactly the same copy, with just a [06:18.760 --> 06:24.840] different way of interacting with a different layer. So this is how it looks like on MyBinder, [06:24.840 --> 06:32.800] so it's a classical Jupyter notebook, and for every notebook we have a GitHub repo where we [06:32.800 --> 06:40.440] store all the requirements and all the images in the data set. We have to put together the [06:40.440 --> 06:49.920] fair metadata, but still, so it's under construction. Then what does it mean having Jupyter [06:49.920 --> 06:56.800] notebooks for publishing? We see that in the literature there are a lot of critics who shouldn't [06:56.800 --> 07:03.320] use Jupyter notebooks because it's too complex, it's impossible to replicate and so on and so forth. [07:03.320 --> 07:11.760] But then for us, it was really the simplest solution. So at the same time, to publish with [07:11.760 --> 07:17.160] Jupyter, we had to make our pipeline a bit more complex than usual. So we have a first review [07:17.240 --> 07:23.280] directly on the abstract, where we start communicating with the authors, understanding [07:23.280 --> 07:30.000] their needs, creating a writing environment for them that can be replicated with Python, [07:30.000 --> 07:36.080] sorry, with Docker containers for Python and Air. And then there's the first technical review, [07:36.080 --> 07:40.960] she's in charge of the first technical review, which is the most complicated one because there's [07:41.440 --> 07:48.800] a lot of checks. We saw some projects already, we needed to have checks. And then we have a lot [07:48.800 --> 07:53.520] of other open source software that enters this pipeline, like for the preview of the notebook [07:53.520 --> 08:00.080] with the GitHub app, MB Viewer, we have MyBinder, and this is just for the first technical review [08:00.080 --> 08:05.880] because then the article is being sent to the reviewer for the double banana review. So before [08:05.960 --> 08:12.840] even reviewing, we had to do this huge job because they have to review also the data and the pertinence [08:12.840 --> 08:20.160] of the dataset. And then finally, there is one important thing, it's English editing. So how [08:20.160 --> 08:26.560] to edit something that which is already being run, so without running itself. So this could be a [08:26.560 --> 08:32.200] tool for translators, tool for correctors that they're not into the Jupyter world. So how to do [08:32.200 --> 08:38.840] that? We have Jupyter text, we're still testing some plug-in to see if this could work without [08:38.840 --> 08:44.840] touching the final output. And then the final technical review, so after all this has been [08:44.840 --> 08:50.440] shipped, we have a DOI. So the article is now published and needs to be indexing and there is [08:50.440 --> 08:56.840] the problem of long-term archiving, which is a big problem for many reasons. First of all, [08:56.920 --> 09:06.600] like the libraries that get deprecated, also API that disappeared. So how to really reproduce [09:06.600 --> 09:11.800] this in the future? And then finally, the dataset needs to be included into, we have [09:11.800 --> 09:18.360] Dataverse, but we're looking for Zenodo in order to match the fair metadata. And time is up, [09:18.360 --> 09:25.160] I have a question for you, of course. Thank you very much, first of all. And then if you have [09:25.160 --> 09:30.280] want to contact us, just collaborate or work together on Jupyter publication, [09:30.920 --> 09:36.840] JDH admin at uni.lu. And then the question is, how can we actually collaborate on something [09:36.840 --> 09:44.680] which is a notebook that requires quite a threshold of expertise, not only for the researcher, [09:44.680 --> 09:51.000] but for the people that are around, and how to maintain all this and how to make this history [09:51.000 --> 09:54.040] love living for more than one year. Thank you. [10:21.080 --> 10:30.600] Yeah, well, I repeat the question. So he asked me if the double blind review, how can we keep it [10:30.600 --> 10:38.600] actually a real natural double blind? So she anonymized the data on GitHub. So we have [10:38.600 --> 10:43.320] specific repositories that have been created after the communication with the authors, [10:43.320 --> 10:50.120] where we only have the code without the names, but then you still have the bibliography, [10:50.200 --> 10:56.120] so it's easy to, it's a very small word, one of the digital history. But this is the way to [10:56.120 --> 11:02.600] maintain double blind. And then we're going to send the review where both the MyBinder and the [11:02.600 --> 11:11.960] version of the article on our website with a hidden URL. So this is the only thing that we can do. [11:12.760 --> 11:19.080] For sure, the double blind, we have the problem that we cannot really use the [11:19.080 --> 11:24.520] pull request directly on the GitHub repository. So in fact, there is some replication between [11:24.520 --> 11:31.240] the GitHub repository. Sometimes after with the peer review, there is some [11:32.280 --> 11:39.160] requisite that he come back to a technical review because there is a revision. So there [11:39.160 --> 11:44.920] is this question about how we re-synchronize the notebook together. There is some authors that [11:48.920 --> 11:57.000] they have good enough with GitHub, but to review a notebook with the output with the metadata [11:57.000 --> 12:00.840] to track what has been changed. That's why, yes, this it was, [12:01.560 --> 12:09.560] but yes, the questions that you have, we are testing with review and be or not also to maybe [12:10.680 --> 12:18.040] use some markdown or just Python script to produce several output in order to not [12:19.080 --> 12:24.120] sometimes touch about this metadata that they are inside the notebook. [12:24.120 --> 12:31.960] And there was another question, but I don't know if we have time. Yes. Yes. Yes. Please. [12:31.960 --> 12:33.960] Last one. Last one. [12:33.960 --> 12:37.960] Of the SS brightness of your data sets. [12:37.960 --> 12:42.360] Sorry, how to assess? Of the SS brightness of your data sets. [12:42.360 --> 12:49.160] Yeah, that's the very big, big, big question. So the idea behind the braided narrative is then you [12:49.160 --> 12:56.280] tell the story around the data on one side and on the other side you keep the data like with the [12:56.280 --> 13:05.400] Zenodo metadata coherent or probably with what Paul showed us before with Ricardo. So having [13:06.520 --> 13:12.280] like an external check on the metadata and on the data set itself. At the same time, [13:12.280 --> 13:18.360] the initial, the first technical review is the one where we assess actually the data. So if the [13:18.360 --> 13:24.760] data set are complete, coherent, we don't judge them because then we know that there are conditions [13:24.760 --> 13:30.280] of production. That needs to be, we try to make this as more explicit as possible. [13:30.280 --> 13:32.280] That's. [13:33.240 --> 13:38.920] Yes, exactly. And this, like that's why the long-term maintenance. So now we only have [13:38.920 --> 13:46.360] nine articles, but we have 28 in the pipeline in the coming year. So it's really now it's [13:46.360 --> 13:51.960] getting us up speed and we have more and more interaction with others which makes things more [13:51.960 --> 13:53.960] complicated. Thank you.