[00:00.000 --> 00:12.400] Hello, Faustem. My name is Alejandro and today I am going to talk about Papis, a simple, [00:12.400 --> 00:17.400] powerful and extendable command line bibliography manager that I have been developing during [00:17.400 --> 00:23.320] the last 7 years. I will be explaining some of the main considerations of the project [00:23.400 --> 00:30.840] and demoing some of its basic use cases. First of all, let me introduce myself. I work currently [00:30.840 --> 00:37.320] as a physicist at the Technical University of Vienna in Austria. We develop massively parallel [00:37.320 --> 00:43.320] algorithms in order to calculate properties of molecules and solids from a theoretical point of [00:43.320 --> 00:49.640] view. You can find me on Nasodon or around the web. Don't hesitate to contact me. [00:52.520 --> 01:01.160] So, what is Papis? Papis started as a simple bibliography manager built around the command [01:01.160 --> 01:10.120] line. It should make possible to manage papers or books at scale or for small curated libraries. [01:11.080 --> 01:17.480] It is therefore important to implement a simple data model and use an approachable programming [01:17.480 --> 01:24.360] language, such as Python, so that users can interact easily with Papis' many features. [01:26.280 --> 01:34.040] In addition, Python also encourages contributions from researchers in the academic world. Since [01:34.360 --> 01:44.360] nowadays, many researchers are exposed to this language. Papis strives to be and build a community, [01:45.480 --> 01:49.720] and various plugins have appeared thanks to the community. [01:52.360 --> 01:59.480] There are plugins for the major text editors, such as NeoVim and Emacs, and partial support [01:59.480 --> 02:07.720] exists for VS Code and Vim. Additionally, lately we have been working on the web application for [02:07.720 --> 02:12.200] Papis, and I will be showing some of its features in this talk. [02:15.000 --> 02:21.880] But you are asking yourself, why Papis? We think that it should be possible and simple [02:21.880 --> 02:28.200] to perform complex tasks on a whole library. This is made possible through a rich command [02:28.520 --> 02:36.120] line interface. You can add papers from a DOI or from a variety of websites supported by Papis. [02:36.840 --> 02:41.240] You can explore sources like Crossref from the command line, [02:41.960 --> 02:46.040] or download information about the citations of a publication, [02:46.760 --> 02:51.240] or check which publications cite the current publication. [02:51.720 --> 02:56.680] You can take notes that play well with tools like Vim or Emacs org mode. [02:56.680 --> 03:02.200] You can version control your documents and export to the most common formats. [03:03.480 --> 03:10.280] You can spend countless hours curating and improving your library's notes, metadata, [03:10.280 --> 03:17.800] and PDF documents without fearing losing your data to an API change or end-of-life of Papis. [03:17.880 --> 03:23.000] Since your data is stored in a very simple but flexible format. [03:26.280 --> 03:31.080] I want to emphasize the fact that one of the main goals of Papis [03:31.080 --> 03:35.000] is enabling the user to be independent of Papis itself. [03:36.040 --> 03:42.040] A researcher, academic or not, spends an enormous amount of time searching, [03:42.600 --> 03:45.640] reading and not notating publications. [03:46.760 --> 03:53.960] For us Papis maintainers, it is important that a person comfortable with any scripting language [03:53.960 --> 04:01.240] should be able to retrieve the totality of Papis data by writing a script in an afternoon. [04:03.320 --> 04:09.080] In order to accomplish this, an extremely simple library structure was chosen. [04:09.560 --> 04:15.000] The library structure relies on having one folder per library document. [04:15.000 --> 04:20.440] This means, for instance, in the case of the shown publication of Turing, [04:20.440 --> 04:26.440] the folder includes a YAML file containing the metadata information of the publication [04:26.440 --> 04:31.640] and an additional PDF file with the published publication itself. [04:32.200 --> 04:39.000] In this example library, we would have an additional document under the folder 1-document, [04:39.000 --> 04:42.200] where we find two PDF files in this case. [04:43.560 --> 04:51.000] A document in a Papis library is any folder containing a YAML file entitled info.yaml. [04:53.000 --> 04:59.000] The contents of the YAML file are in principle up to the user's to the user's [04:59.160 --> 05:02.600] are in principle up to the user's to determine. [05:02.600 --> 05:07.320] However, in practice, there are some conventions used in Papis. [05:09.240 --> 05:17.720] Inside the info.yaml file, the key files contains a list of related files in the documents directory. [05:18.680 --> 05:25.320] These files might be PDF files or any other kind of files relevant to the document. [05:26.200 --> 05:28.360] In the case of the Turing publication, [05:28.360 --> 05:35.480] files therefore lists a single PDF document, paper.pdf. [05:36.840 --> 05:44.040] The key ref is used for exploring BipTec files and is the reference of the document [05:44.040 --> 05:47.560] when using bibliographic tools outside of Papis. [05:48.360 --> 05:54.120] The YAML key type is also used for BipTecs exploring and is [05:55.080 --> 06:00.840] the type of document, whether a book, an article, a monograph, etc. [06:01.960 --> 06:05.000] There is also an in-built support for tags, [06:05.000 --> 06:09.160] which may be added as a list of space-separated keywords. [06:10.280 --> 06:14.440] We chose the YAML format due to its ease of writing, reading, [06:14.440 --> 06:20.280] and because most programming languages are provided with libraries that can read these files. [06:21.240 --> 06:25.080] Of course, given the simplicity of the library model, [06:25.080 --> 06:30.360] it is possible to write a crude finder with just a unique scrap and fine commands. [06:32.360 --> 06:38.200] All functionalities in Papis can be customized through a configuration file in the INI format. [06:40.200 --> 06:44.680] Papis can define multiple libraries through the configuration file, [06:44.680 --> 06:49.480] and all Papis settings can be independently configured for each library. [06:50.840 --> 06:54.680] You can define default settings under the Settings section, [06:55.400 --> 06:57.560] which will be common to all libraries. [06:58.760 --> 07:03.080] A library is simply defined as a section with a dir key, [07:03.080 --> 07:07.960] which contains the path to the library directory containing all documents. [07:09.480 --> 07:14.920] You can then customize this library, in this case a library named Papis, [07:14.920 --> 07:19.000] and set the default opener tool to the PDF viewer events. [07:20.280 --> 07:25.960] If you happen to want an additional library of books holding mostly EPUB formatted books, [07:27.080 --> 07:30.120] you could define the opener to be caliber instead. [07:31.560 --> 07:36.280] You can read about all the configuration settings in the Documentation page, [07:36.280 --> 07:40.520] where you will see a description of their function and their default values. [07:42.120 --> 07:48.680] With this introduction, let us take a look now at a common workflow to add an article from a journal page. [07:51.000 --> 07:54.040] Here is a common view of an article in a browser. [07:54.760 --> 08:00.760] We can see lots of information, and the easiest way of adding this article to Papis [08:01.320 --> 08:05.240] will be by locating the DOI of the article in the page. [08:07.080 --> 08:12.040] In this case, we locate the DOI in the URL of the article, [08:12.040 --> 08:16.280] and we copy it to our clipboard to paste it in the terminal. [08:16.840 --> 08:25.400] The command for adding a paper is Papis add, and Papis add comes with quite many options. [08:26.120 --> 08:33.160] In general, when adding a document, Papis will try to download metadata from various sources and, [08:33.160 --> 08:39.240] if possible, download PDF documents, if they are freely and legally available. [08:39.320 --> 08:44.440] In here, we see that I am using the edit flag. [08:45.480 --> 08:51.880] This flag instructs the Papis add command to open the editor with the info.yaml file [08:51.880 --> 08:54.200] before adding the document to the library. [08:56.120 --> 09:02.120] Similarly, the open flag instructs the command to open the attached files, if any, [09:02.120 --> 09:04.600] before adding the document to the library. [09:05.560 --> 09:08.920] We are also telling the command through the from flag [09:08.920 --> 09:13.080] to retrieve information exclusively from the DOI. [09:15.000 --> 09:18.920] We can also preset some metadata through the command line. [09:19.880 --> 09:24.840] In this case, we are adding the tags, classics and DFT. [09:26.840 --> 09:28.680] Let's go ahead and run the command. [09:28.680 --> 09:37.160] Papis will now try to download metadata and a PDF file from online sources. [09:38.120 --> 09:43.720] In the current configuration, we are greeted with an interactive prompt to add, [09:43.720 --> 09:47.960] split or reject the metadata retrieved from Crossref. [09:48.840 --> 09:51.160] We choose to accept the metadata. [09:51.400 --> 09:57.400] The interactive session now shows us a retrieved PDF document and asks us, [09:57.400 --> 10:00.440] if this is the document that belongs to the publication. [10:01.400 --> 10:08.200] At this point, we can inspect the document and we realize that we indeed want this PDF file, [10:08.200 --> 10:09.400] so we press Y. [10:11.800 --> 10:18.360] Now, all the information is in place and we can see a preliminary version of the info file [10:19.160 --> 10:20.920] since we pass the edit flag. [10:22.920 --> 10:29.560] We can see that a lot of information could be retrieved, detailed author list information, [10:30.520 --> 10:38.360] volume, pages, among others, and our tags have found their way into the YAML file correctly. [10:40.760 --> 10:46.360] A confirmation prompt subsequently appears since we pass the confirmed flag to the command. [10:48.600 --> 10:54.280] We agree to it and, therefore, the document gets added to the library. [10:56.360 --> 11:01.160] We can now fetch information about the publication cited in this article. [11:01.160 --> 11:07.240] The command for this is citations and we pass to it the fetch citations flag, [11:07.800 --> 11:15.000] which first checks for information in our library and then heads to Crossref to retrieve [11:15.000 --> 11:19.800] relevant information about the references appearing in our newly added document. [11:21.560 --> 11:25.800] If we now open the directory where the document has been stored, [11:26.440 --> 11:32.520] we see that the PDF file has been correctly stored alongside the info YAML file [11:32.520 --> 11:37.400] and the newly generated citations.yaml file. [11:45.240 --> 12:08.680] If we inspect the citations file, we see that it is in the format of a list of YAML files, [12:08.680 --> 12:15.480] where every element separated by three dashes represents bibliographic information about the [12:15.480 --> 12:23.240] citations. This can be used for scripting, for browsing the citations, or for easily [12:23.240 --> 12:32.280] visualizing them through the web application. This demo will show how to leverage the Puppys API [12:32.280 --> 12:39.320] in Python to write one of the simplest scripts you can write. You can find more information [12:39.320 --> 12:43.480] in the documentation together with other more complex example scripts. [12:45.080 --> 12:50.520] First of all, let us add a bigger library to our demo library. For this, [12:50.520 --> 12:55.240] we need to edit the configuration file and add an additional library. [12:55.960 --> 13:03.240] After adding the library, we can list the directories with the list command, [13:04.600 --> 13:09.000] which shows us the interactive interface to select documents. [13:10.520 --> 13:16.520] Most Puppys commands accept a query argument as an input. In this case, [13:16.520 --> 13:21.960] we can query for documents matching the author to include the string Einstein. [13:22.360 --> 13:30.840] We can also use the all flag to do a Puppys action to all documents matching the query. [13:31.640 --> 13:35.560] In this case, listing the full paths for the folders. [13:38.040 --> 13:43.960] Other commands like open, edit, or update work in a similar fashion. [13:44.280 --> 13:52.120] Next, we will write a simple Python script to scan all the documents in the library [13:52.760 --> 13:59.720] and add the tag to the document whenever the substring this appears in the title of the document. [14:01.800 --> 14:08.680] To do this, we can use the Puppys API submodule and we can obtain all documents in the current [14:08.680 --> 14:12.600] library with the function get all documents in lib. [14:15.240 --> 14:22.040] Next, we loop over all documents and we deal with the document as if it were a Python dictionary. [14:39.240 --> 14:51.480] The method save saves the document. [14:53.640 --> 14:58.840] I will comment out the save call since I don't want it to override the library. [15:02.200 --> 15:06.840] Let's run the script and see that it works. And indeed, it works. [15:09.400 --> 15:12.440] The last demonstration will concern the web application. [15:13.080 --> 15:17.720] The web application is quite useful if you would like to self host Puppys [15:17.720 --> 15:19.880] or access it from a portable device. [15:22.280 --> 15:29.320] We can run the web application using the serve command to which we can pass a port 88888. [15:29.480 --> 15:40.280] Directing our browser to the URL localhost colon 8888, [15:40.920 --> 15:47.480] we see the starting page of the web application where we are presented with a simple query prompt. [15:50.440 --> 15:56.840] Other pages include listing all the documents in the library, listing all the tags, [15:57.480 --> 16:04.440] and browsing a different library. Let us again enter the author Einstein query into the prompt. [16:05.560 --> 16:10.360] The result page includes a handy timeline with the results of the query [16:11.080 --> 16:18.600] and a simple multi-line list of the results. In this timeline, we can see for instance directly [16:18.600 --> 16:25.240] the annus mirabilis of Einstein together with a couple of other publications further right. [16:25.640 --> 16:30.520] We could click on the title of the timeline and go to the respective document page. [16:32.520 --> 16:39.160] In the results for the document, we see a left block with some basic information and the PDF links. [16:40.120 --> 16:47.240] On the right hand side, we see the citation, references, and several external links for the document. [16:56.200 --> 17:12.280] Let us look for the first paper we added at the beginning of this presentation. [17:13.480 --> 17:23.240] It is worth noting that we can click on the tags of the documents to get the results for the given tags. [17:25.480 --> 17:43.000] If we click on the arrow, we will navigate into the document page. The red notifications advise us [17:43.000 --> 17:48.280] of small problems with the data in our document. However, I will not fix those now. [17:49.240 --> 17:56.680] The document page is a multi-tap page where the first tab presents most of the information [17:56.680 --> 18:05.240] of the document in an HTML form fashion. Additionally, we have access to the raw info file [18:06.360 --> 18:14.520] where we can modify and override its contents. We have added a BipDex tab for LaTeX users. [18:14.680 --> 18:23.640] This document has a single file attached and we can preview it on the browser thanks to the library [18:23.640 --> 18:31.480] PDF JavaScript. We can also download the document or open the document in a new window. [18:35.480 --> 18:40.600] In the next tab, we can visualize the citations file that we generated previously. [18:41.480 --> 18:45.240] This tab also has a timeline like the search results [18:45.240 --> 18:52.280] and the documents with the green reference indicate that these documents exist in our library and we [18:52.280 --> 19:04.760] can open them. Let us open this article page. [19:10.600 --> 19:34.760] For this article, we have also generated citations, but we can also use the Harvard ATS service. [19:41.560 --> 19:52.040] In the case of articles citing the current article, we have not generated this file [19:52.680 --> 19:56.920] and therefore we get an embedded page from ATS by default. [20:00.760 --> 20:04.760] In the last tab, we can edit the nodes from the browser. [20:11.560 --> 20:23.560] Furthermore, clicking on the tags and library pages, we can see how these interfaces look like. [20:33.560 --> 20:40.040] Thank you very much for your attention. For further information, visit the projects page [20:40.040 --> 20:49.800] over at GitHub. Of course, Puppies is only alive because of its community. I would like to thank [20:49.800 --> 20:56.840] all the users and contributors over the years. I would like to specially thank the co-mentainers [20:56.840 --> 21:04.840] of Puppies, Alex Fickle and Julian Hauser for their hard work in the last year. I hope you [21:04.840 --> 21:12.200] enjoyed the presentation and I'll be answering your questions shortly. [21:22.440 --> 21:28.200] Fantastic. Thank you so much for that really actually quite interesting talk Alejandro. [21:28.520 --> 21:36.120] I really quite felt inspired thinking, wow, can I run this with my own publications as a way of [21:36.120 --> 21:41.560] collating stuff and also sharing it with the world? It's always nice when you watch a talk [21:41.560 --> 21:48.520] and you immediately think, yes, I'm going to use this as well. So I have a few questions here. [21:49.320 --> 21:55.400] I think my first one, perhaps, might be, I historically have used Zotero. I had to think [21:55.400 --> 22:02.600] very carefully not to say no to that. Is it easy for me to migrate if I was inclined to migrate [22:02.600 --> 22:15.800] from Zotero or other plugins? Over the years, quite a lot of people have developed some plugins for [22:15.800 --> 22:24.360] the interface of Zotero and Puppies. You can export to simply and create Puppies libraries [22:25.240 --> 22:31.880] but I'm also aware of some people that actually use both. So they have a workflow to [22:32.600 --> 22:39.080] export this dynamically the whole time. So it is in principle compatible and there are [22:39.080 --> 22:42.840] a couple of projects that do this. This is coming from the community. [22:44.760 --> 22:49.160] Thanks. That's actually really appealing to me because whilst I was watching the way you [22:49.160 --> 22:54.360] added with the DOI, I thought that was really cool but there's still like seven command line [22:54.360 --> 23:01.720] flags and I'd really like the button that says add this to Zotero. Yeah. So the thing is, [23:02.840 --> 23:10.680] yeah, I should have maybe given some more examples of adding some documents. So in principle, it's [23:10.680 --> 23:23.560] also possible to add a document just by the URL. So there are some automatic recognition in Puppies. [23:23.560 --> 23:35.720] So most URLs are recognized and it could in this case even revive URL or [23:37.960 --> 23:48.440] within the HTML page. I noticed that there is a tool to use. [23:53.640 --> 23:59.720] Sorry. Your connection has gone just robot enough. [24:07.640 --> 24:13.560] Testing, testing. Do we still have you now? Yes, that is much better. Could you maybe just repeat [24:13.560 --> 24:19.400] your last two or three sentences because it was just a bit hard to hear. Sorry. So there are some, [24:20.360 --> 24:34.040] Zotero has implemented also quite a lot of metadata fetches from many sources and there is also a [24:34.040 --> 24:44.200] project that tries to reuse these metadata fetches from Zotero for their use in Puppies and maybe [24:44.200 --> 24:50.600] also in the web application in Puppies. So this might also happen in the future. But in general, [24:50.600 --> 24:59.400] it's much easier to add documents than what I showed in the video. Cool. Thanks. It's also [24:59.400 --> 25:04.760] really nice to hear how interoperable y'all are. So we have a couple more questions in the chat. [25:06.280 --> 25:12.120] So Paul says, does the YAML format follow bibliographic standards of any type? [25:14.600 --> 25:28.200] So we try to use most of the BipTec keywords when they are applicable and in general the YAML format [25:28.200 --> 25:36.280] is really free to the user to use. So you might want to use a particular convention in your YAML [25:36.280 --> 25:45.400] files but the keywords are mostly motivated by BipTec. That's the only one. [25:47.000 --> 25:50.840] It still sounds like there's some decent interoperability there which is really nice. [25:52.680 --> 25:59.560] Celia asks, who are the main users of Pappis in terms of discipline, students, researchers, etc? [26:00.520 --> 26:11.720] Well, I know that's a good question. A lot of biophysics and biosciences, so bioinformatics, [26:11.720 --> 26:17.880] I know quite a lot of people that use it. Physics, mathematics and computer science, [26:17.880 --> 26:25.080] I would say. These are the ones. But for instance, Julian is one commentator and he's a philosopher. [26:25.880 --> 26:35.800] So it really helps. Of course, you have to be a little bit acquainted with the command line. [26:36.360 --> 26:42.760] Maybe hopefully through the web application in the future this will change. But it's in general [26:42.760 --> 26:48.760] people that really care about their libraries, the metadata that they have in their libraries, [26:48.840 --> 26:57.240] and they really want to have a very clear, clear representation of their data. They don't want [26:57.240 --> 27:02.760] some upstream database somewhere stored. So they really want everything in plain text. [27:04.760 --> 27:10.040] Yes, I think you demonstrated so beautifully how accessible your own data is and it's [27:10.040 --> 27:17.400] surprisingly rare. Do you have any trainings for Pappis so that people who maybe are a bit [27:17.400 --> 27:24.120] less confident could learn more about it? Sadly, not right now. Maybe that's something if enough [27:24.120 --> 27:30.680] people are interested. That's something that we could certainly look into. But we have the [27:30.680 --> 27:39.080] discussions in GitHub. So quite a lot of people ask questions there. So there are also frequently [27:39.080 --> 27:47.800] asked questions there. And we have also a Zulip chat. Also we are on Libera, but right now not [27:47.800 --> 27:55.480] so many people are there. And yeah, so just drop by and ask whatever you want. [27:57.000 --> 28:05.160] Super, thank you. So we have about 45 seconds left. That's time to squeeze in one last question. [28:05.160 --> 28:09.880] So we have one here from Paul. And we've got some love for the timeline, which I agree. I was like, [28:09.880 --> 28:17.880] oh, I want that. Do you plan any other visualizations, like maybe publication networks from citation data? [28:19.240 --> 28:24.600] Yes, actually, yes, because I realized I really like these visualizations. [28:26.680 --> 28:33.000] I plan some like with the citation, some trees and stuff like this. But I would like to have [28:33.000 --> 28:38.120] more feedback from from users to really know what's really sensible and useful. [28:40.360 --> 28:45.640] Thanks. That's a great point. So we have three seconds left. Thank you so much. [28:45.640 --> 28:50.600] Thank you. Thank you, all of you. All right, I think we're off the live stream. That was [28:51.400 --> 28:57.320] I'm so going to be going back and like setting my own purpose up with a web server. Thank you so much. [28:57.320 --> 29:02.040] Yeah, thank you. Thank you. Okay, I'm going to hop to the next talk.