[00:00.000 --> 00:12.400]  Hello, Faustem. My name is Alejandro and today I am going to talk about Papis, a simple,
[00:12.400 --> 00:17.400]  powerful and extendable command line bibliography manager that I have been developing during
[00:17.400 --> 00:23.320]  the last 7 years. I will be explaining some of the main considerations of the project
[00:23.400 --> 00:30.840]  and demoing some of its basic use cases. First of all, let me introduce myself. I work currently
[00:30.840 --> 00:37.320]  as a physicist at the Technical University of Vienna in Austria. We develop massively parallel
[00:37.320 --> 00:43.320]  algorithms in order to calculate properties of molecules and solids from a theoretical point of
[00:43.320 --> 00:49.640]  view. You can find me on Nasodon or around the web. Don't hesitate to contact me.
[00:52.520 --> 01:01.160]  So, what is Papis? Papis started as a simple bibliography manager built around the command
[01:01.160 --> 01:10.120]  line. It should make possible to manage papers or books at scale or for small curated libraries.
[01:11.080 --> 01:17.480]  It is therefore important to implement a simple data model and use an approachable programming
[01:17.480 --> 01:24.360]  language, such as Python, so that users can interact easily with Papis' many features.
[01:26.280 --> 01:34.040]  In addition, Python also encourages contributions from researchers in the academic world. Since
[01:34.360 --> 01:44.360]  nowadays, many researchers are exposed to this language. Papis strives to be and build a community,
[01:45.480 --> 01:49.720]  and various plugins have appeared thanks to the community.
[01:52.360 --> 01:59.480]  There are plugins for the major text editors, such as NeoVim and Emacs, and partial support
[01:59.480 --> 02:07.720]  exists for VS Code and Vim. Additionally, lately we have been working on the web application for
[02:07.720 --> 02:12.200]  Papis, and I will be showing some of its features in this talk.
[02:15.000 --> 02:21.880]  But you are asking yourself, why Papis? We think that it should be possible and simple
[02:21.880 --> 02:28.200]  to perform complex tasks on a whole library. This is made possible through a rich command
[02:28.520 --> 02:36.120]  line interface. You can add papers from a DOI or from a variety of websites supported by Papis.
[02:36.840 --> 02:41.240]  You can explore sources like Crossref from the command line,
[02:41.960 --> 02:46.040]  or download information about the citations of a publication,
[02:46.760 --> 02:51.240]  or check which publications cite the current publication.
[02:51.720 --> 02:56.680]  You can take notes that play well with tools like Vim or Emacs org mode.
[02:56.680 --> 03:02.200]  You can version control your documents and export to the most common formats.
[03:03.480 --> 03:10.280]  You can spend countless hours curating and improving your library's notes, metadata,
[03:10.280 --> 03:17.800]  and PDF documents without fearing losing your data to an API change or end-of-life of Papis.
[03:17.880 --> 03:23.000]  Since your data is stored in a very simple but flexible format.
[03:26.280 --> 03:31.080]  I want to emphasize the fact that one of the main goals of Papis
[03:31.080 --> 03:35.000]  is enabling the user to be independent of Papis itself.
[03:36.040 --> 03:42.040]  A researcher, academic or not, spends an enormous amount of time searching,
[03:42.600 --> 03:45.640]  reading and not notating publications.
[03:46.760 --> 03:53.960]  For us Papis maintainers, it is important that a person comfortable with any scripting language
[03:53.960 --> 04:01.240]  should be able to retrieve the totality of Papis data by writing a script in an afternoon.
[04:03.320 --> 04:09.080]  In order to accomplish this, an extremely simple library structure was chosen.
[04:09.560 --> 04:15.000]  The library structure relies on having one folder per library document.
[04:15.000 --> 04:20.440]  This means, for instance, in the case of the shown publication of Turing,
[04:20.440 --> 04:26.440]  the folder includes a YAML file containing the metadata information of the publication
[04:26.440 --> 04:31.640]  and an additional PDF file with the published publication itself.
[04:32.200 --> 04:39.000]  In this example library, we would have an additional document under the folder 1-document,
[04:39.000 --> 04:42.200]  where we find two PDF files in this case.
[04:43.560 --> 04:51.000]  A document in a Papis library is any folder containing a YAML file entitled info.yaml.
[04:53.000 --> 04:59.000]  The contents of the YAML file are in principle up to the user's to the user's
[04:59.160 --> 05:02.600]  are in principle up to the user's to determine.
[05:02.600 --> 05:07.320]  However, in practice, there are some conventions used in Papis.
[05:09.240 --> 05:17.720]  Inside the info.yaml file, the key files contains a list of related files in the documents directory.
[05:18.680 --> 05:25.320]  These files might be PDF files or any other kind of files relevant to the document.
[05:26.200 --> 05:28.360]  In the case of the Turing publication,
[05:28.360 --> 05:35.480]  files therefore lists a single PDF document, paper.pdf.
[05:36.840 --> 05:44.040]  The key ref is used for exploring BipTec files and is the reference of the document
[05:44.040 --> 05:47.560]  when using bibliographic tools outside of Papis.
[05:48.360 --> 05:54.120]  The YAML key type is also used for BipTecs exploring and is
[05:55.080 --> 06:00.840]  the type of document, whether a book, an article, a monograph, etc.
[06:01.960 --> 06:05.000]  There is also an in-built support for tags,
[06:05.000 --> 06:09.160]  which may be added as a list of space-separated keywords.
[06:10.280 --> 06:14.440]  We chose the YAML format due to its ease of writing, reading,
[06:14.440 --> 06:20.280]  and because most programming languages are provided with libraries that can read these files.
[06:21.240 --> 06:25.080]  Of course, given the simplicity of the library model,
[06:25.080 --> 06:30.360]  it is possible to write a crude finder with just a unique scrap and fine commands.
[06:32.360 --> 06:38.200]  All functionalities in Papis can be customized through a configuration file in the INI format.
[06:40.200 --> 06:44.680]  Papis can define multiple libraries through the configuration file,
[06:44.680 --> 06:49.480]  and all Papis settings can be independently configured for each library.
[06:50.840 --> 06:54.680]  You can define default settings under the Settings section,
[06:55.400 --> 06:57.560]  which will be common to all libraries.
[06:58.760 --> 07:03.080]  A library is simply defined as a section with a dir key,
[07:03.080 --> 07:07.960]  which contains the path to the library directory containing all documents.
[07:09.480 --> 07:14.920]  You can then customize this library, in this case a library named Papis,
[07:14.920 --> 07:19.000]  and set the default opener tool to the PDF viewer events.
[07:20.280 --> 07:25.960]  If you happen to want an additional library of books holding mostly EPUB formatted books,
[07:27.080 --> 07:30.120]  you could define the opener to be caliber instead.
[07:31.560 --> 07:36.280]  You can read about all the configuration settings in the Documentation page,
[07:36.280 --> 07:40.520]  where you will see a description of their function and their default values.
[07:42.120 --> 07:48.680]  With this introduction, let us take a look now at a common workflow to add an article from a journal page.
[07:51.000 --> 07:54.040]  Here is a common view of an article in a browser.
[07:54.760 --> 08:00.760]  We can see lots of information, and the easiest way of adding this article to Papis
[08:01.320 --> 08:05.240]  will be by locating the DOI of the article in the page.
[08:07.080 --> 08:12.040]  In this case, we locate the DOI in the URL of the article,
[08:12.040 --> 08:16.280]  and we copy it to our clipboard to paste it in the terminal.
[08:16.840 --> 08:25.400]  The command for adding a paper is Papis add, and Papis add comes with quite many options.
[08:26.120 --> 08:33.160]  In general, when adding a document, Papis will try to download metadata from various sources and,
[08:33.160 --> 08:39.240]  if possible, download PDF documents, if they are freely and legally available.
[08:39.320 --> 08:44.440]  In here, we see that I am using the edit flag.
[08:45.480 --> 08:51.880]  This flag instructs the Papis add command to open the editor with the info.yaml file
[08:51.880 --> 08:54.200]  before adding the document to the library.
[08:56.120 --> 09:02.120]  Similarly, the open flag instructs the command to open the attached files, if any,
[09:02.120 --> 09:04.600]  before adding the document to the library.
[09:05.560 --> 09:08.920]  We are also telling the command through the from flag
[09:08.920 --> 09:13.080]  to retrieve information exclusively from the DOI.
[09:15.000 --> 09:18.920]  We can also preset some metadata through the command line.
[09:19.880 --> 09:24.840]  In this case, we are adding the tags, classics and DFT.
[09:26.840 --> 09:28.680]  Let's go ahead and run the command.
[09:28.680 --> 09:37.160]  Papis will now try to download metadata and a PDF file from online sources.
[09:38.120 --> 09:43.720]  In the current configuration, we are greeted with an interactive prompt to add,
[09:43.720 --> 09:47.960]  split or reject the metadata retrieved from Crossref.
[09:48.840 --> 09:51.160]  We choose to accept the metadata.
[09:51.400 --> 09:57.400]  The interactive session now shows us a retrieved PDF document and asks us,
[09:57.400 --> 10:00.440]  if this is the document that belongs to the publication.
[10:01.400 --> 10:08.200]  At this point, we can inspect the document and we realize that we indeed want this PDF file,
[10:08.200 --> 10:09.400]  so we press Y.
[10:11.800 --> 10:18.360]  Now, all the information is in place and we can see a preliminary version of the info file
[10:19.160 --> 10:20.920]  since we pass the edit flag.
[10:22.920 --> 10:29.560]  We can see that a lot of information could be retrieved, detailed author list information,
[10:30.520 --> 10:38.360]  volume, pages, among others, and our tags have found their way into the YAML file correctly.
[10:40.760 --> 10:46.360]  A confirmation prompt subsequently appears since we pass the confirmed flag to the command.
[10:48.600 --> 10:54.280]  We agree to it and, therefore, the document gets added to the library.
[10:56.360 --> 11:01.160]  We can now fetch information about the publication cited in this article.
[11:01.160 --> 11:07.240]  The command for this is citations and we pass to it the fetch citations flag,
[11:07.800 --> 11:15.000]  which first checks for information in our library and then heads to Crossref to retrieve
[11:15.000 --> 11:19.800]  relevant information about the references appearing in our newly added document.
[11:21.560 --> 11:25.800]  If we now open the directory where the document has been stored,
[11:26.440 --> 11:32.520]  we see that the PDF file has been correctly stored alongside the info YAML file
[11:32.520 --> 11:37.400]  and the newly generated citations.yaml file.
[11:45.240 --> 12:08.680]  If we inspect the citations file, we see that it is in the format of a list of YAML files,
[12:08.680 --> 12:15.480]  where every element separated by three dashes represents bibliographic information about the
[12:15.480 --> 12:23.240]  citations. This can be used for scripting, for browsing the citations, or for easily
[12:23.240 --> 12:32.280]  visualizing them through the web application. This demo will show how to leverage the Puppys API
[12:32.280 --> 12:39.320]  in Python to write one of the simplest scripts you can write. You can find more information
[12:39.320 --> 12:43.480]  in the documentation together with other more complex example scripts.
[12:45.080 --> 12:50.520]  First of all, let us add a bigger library to our demo library. For this,
[12:50.520 --> 12:55.240]  we need to edit the configuration file and add an additional library.
[12:55.960 --> 13:03.240]  After adding the library, we can list the directories with the list command,
[13:04.600 --> 13:09.000]  which shows us the interactive interface to select documents.
[13:10.520 --> 13:16.520]  Most Puppys commands accept a query argument as an input. In this case,
[13:16.520 --> 13:21.960]  we can query for documents matching the author to include the string Einstein.
[13:22.360 --> 13:30.840]  We can also use the all flag to do a Puppys action to all documents matching the query.
[13:31.640 --> 13:35.560]  In this case, listing the full paths for the folders.
[13:38.040 --> 13:43.960]  Other commands like open, edit, or update work in a similar fashion.
[13:44.280 --> 13:52.120]  Next, we will write a simple Python script to scan all the documents in the library
[13:52.760 --> 13:59.720]  and add the tag to the document whenever the substring this appears in the title of the document.
[14:01.800 --> 14:08.680]  To do this, we can use the Puppys API submodule and we can obtain all documents in the current
[14:08.680 --> 14:12.600]  library with the function get all documents in lib.
[14:15.240 --> 14:22.040]  Next, we loop over all documents and we deal with the document as if it were a Python dictionary.
[14:39.240 --> 14:51.480]  The method save saves the document.
[14:53.640 --> 14:58.840]  I will comment out the save call since I don't want it to override the library.
[15:02.200 --> 15:06.840]  Let's run the script and see that it works. And indeed, it works.
[15:09.400 --> 15:12.440]  The last demonstration will concern the web application.
[15:13.080 --> 15:17.720]  The web application is quite useful if you would like to self host Puppys
[15:17.720 --> 15:19.880]  or access it from a portable device.
[15:22.280 --> 15:29.320]  We can run the web application using the serve command to which we can pass a port 88888.
[15:29.480 --> 15:40.280]  Directing our browser to the URL localhost colon 8888,
[15:40.920 --> 15:47.480]  we see the starting page of the web application where we are presented with a simple query prompt.
[15:50.440 --> 15:56.840]  Other pages include listing all the documents in the library, listing all the tags,
[15:57.480 --> 16:04.440]  and browsing a different library. Let us again enter the author Einstein query into the prompt.
[16:05.560 --> 16:10.360]  The result page includes a handy timeline with the results of the query
[16:11.080 --> 16:18.600]  and a simple multi-line list of the results. In this timeline, we can see for instance directly
[16:18.600 --> 16:25.240]  the annus mirabilis of Einstein together with a couple of other publications further right.
[16:25.640 --> 16:30.520]  We could click on the title of the timeline and go to the respective document page.
[16:32.520 --> 16:39.160]  In the results for the document, we see a left block with some basic information and the PDF links.
[16:40.120 --> 16:47.240]  On the right hand side, we see the citation, references, and several external links for the document.
[16:56.200 --> 17:12.280]  Let us look for the first paper we added at the beginning of this presentation.
[17:13.480 --> 17:23.240]  It is worth noting that we can click on the tags of the documents to get the results for the given tags.
[17:25.480 --> 17:43.000]  If we click on the arrow, we will navigate into the document page. The red notifications advise us
[17:43.000 --> 17:48.280]  of small problems with the data in our document. However, I will not fix those now.
[17:49.240 --> 17:56.680]  The document page is a multi-tap page where the first tab presents most of the information
[17:56.680 --> 18:05.240]  of the document in an HTML form fashion. Additionally, we have access to the raw info file
[18:06.360 --> 18:14.520]  where we can modify and override its contents. We have added a BipDex tab for LaTeX users.
[18:14.680 --> 18:23.640]  This document has a single file attached and we can preview it on the browser thanks to the library
[18:23.640 --> 18:31.480]  PDF JavaScript. We can also download the document or open the document in a new window.
[18:35.480 --> 18:40.600]  In the next tab, we can visualize the citations file that we generated previously.
[18:41.480 --> 18:45.240]  This tab also has a timeline like the search results
[18:45.240 --> 18:52.280]  and the documents with the green reference indicate that these documents exist in our library and we
[18:52.280 --> 19:04.760]  can open them. Let us open this article page.
[19:10.600 --> 19:34.760]  For this article, we have also generated citations, but we can also use the Harvard ATS service.
[19:41.560 --> 19:52.040]  In the case of articles citing the current article, we have not generated this file
[19:52.680 --> 19:56.920]  and therefore we get an embedded page from ATS by default.
[20:00.760 --> 20:04.760]  In the last tab, we can edit the nodes from the browser.
[20:11.560 --> 20:23.560]  Furthermore, clicking on the tags and library pages, we can see how these interfaces look like.
[20:33.560 --> 20:40.040]  Thank you very much for your attention. For further information, visit the projects page
[20:40.040 --> 20:49.800]  over at GitHub. Of course, Puppies is only alive because of its community. I would like to thank
[20:49.800 --> 20:56.840]  all the users and contributors over the years. I would like to specially thank the co-mentainers
[20:56.840 --> 21:04.840]  of Puppies, Alex Fickle and Julian Hauser for their hard work in the last year. I hope you
[21:04.840 --> 21:12.200]  enjoyed the presentation and I'll be answering your questions shortly.
[21:22.440 --> 21:28.200]  Fantastic. Thank you so much for that really actually quite interesting talk Alejandro.
[21:28.520 --> 21:36.120]  I really quite felt inspired thinking, wow, can I run this with my own publications as a way of
[21:36.120 --> 21:41.560]  collating stuff and also sharing it with the world? It's always nice when you watch a talk
[21:41.560 --> 21:48.520]  and you immediately think, yes, I'm going to use this as well. So I have a few questions here.
[21:49.320 --> 21:55.400]  I think my first one, perhaps, might be, I historically have used Zotero. I had to think
[21:55.400 --> 22:02.600]  very carefully not to say no to that. Is it easy for me to migrate if I was inclined to migrate
[22:02.600 --> 22:15.800]  from Zotero or other plugins? Over the years, quite a lot of people have developed some plugins for
[22:15.800 --> 22:24.360]  the interface of Zotero and Puppies. You can export to simply and create Puppies libraries
[22:25.240 --> 22:31.880]  but I'm also aware of some people that actually use both. So they have a workflow to
[22:32.600 --> 22:39.080]  export this dynamically the whole time. So it is in principle compatible and there are
[22:39.080 --> 22:42.840]  a couple of projects that do this. This is coming from the community.
[22:44.760 --> 22:49.160]  Thanks. That's actually really appealing to me because whilst I was watching the way you
[22:49.160 --> 22:54.360]  added with the DOI, I thought that was really cool but there's still like seven command line
[22:54.360 --> 23:01.720]  flags and I'd really like the button that says add this to Zotero. Yeah. So the thing is,
[23:02.840 --> 23:10.680]  yeah, I should have maybe given some more examples of adding some documents. So in principle, it's
[23:10.680 --> 23:23.560]  also possible to add a document just by the URL. So there are some automatic recognition in Puppies.
[23:23.560 --> 23:35.720]  So most URLs are recognized and it could in this case even revive URL or
[23:37.960 --> 23:48.440]  within the HTML page. I noticed that there is a tool to use.
[23:53.640 --> 23:59.720]  Sorry. Your connection has gone just robot enough.
[24:07.640 --> 24:13.560]  Testing, testing. Do we still have you now? Yes, that is much better. Could you maybe just repeat
[24:13.560 --> 24:19.400]  your last two or three sentences because it was just a bit hard to hear. Sorry. So there are some,
[24:20.360 --> 24:34.040]  Zotero has implemented also quite a lot of metadata fetches from many sources and there is also a
[24:34.040 --> 24:44.200]  project that tries to reuse these metadata fetches from Zotero for their use in Puppies and maybe
[24:44.200 --> 24:50.600]  also in the web application in Puppies. So this might also happen in the future. But in general,
[24:50.600 --> 24:59.400]  it's much easier to add documents than what I showed in the video. Cool. Thanks. It's also
[24:59.400 --> 25:04.760]  really nice to hear how interoperable y'all are. So we have a couple more questions in the chat.
[25:06.280 --> 25:12.120]  So Paul says, does the YAML format follow bibliographic standards of any type?
[25:14.600 --> 25:28.200]  So we try to use most of the BipTec keywords when they are applicable and in general the YAML format
[25:28.200 --> 25:36.280]  is really free to the user to use. So you might want to use a particular convention in your YAML
[25:36.280 --> 25:45.400]  files but the keywords are mostly motivated by BipTec. That's the only one.
[25:47.000 --> 25:50.840]  It still sounds like there's some decent interoperability there which is really nice.
[25:52.680 --> 25:59.560]  Celia asks, who are the main users of Pappis in terms of discipline, students, researchers, etc?
[26:00.520 --> 26:11.720]  Well, I know that's a good question. A lot of biophysics and biosciences, so bioinformatics,
[26:11.720 --> 26:17.880]  I know quite a lot of people that use it. Physics, mathematics and computer science,
[26:17.880 --> 26:25.080]  I would say. These are the ones. But for instance, Julian is one commentator and he's a philosopher.
[26:25.880 --> 26:35.800]  So it really helps. Of course, you have to be a little bit acquainted with the command line.
[26:36.360 --> 26:42.760]  Maybe hopefully through the web application in the future this will change. But it's in general
[26:42.760 --> 26:48.760]  people that really care about their libraries, the metadata that they have in their libraries,
[26:48.840 --> 26:57.240]  and they really want to have a very clear, clear representation of their data. They don't want
[26:57.240 --> 27:02.760]  some upstream database somewhere stored. So they really want everything in plain text.
[27:04.760 --> 27:10.040]  Yes, I think you demonstrated so beautifully how accessible your own data is and it's
[27:10.040 --> 27:17.400]  surprisingly rare. Do you have any trainings for Pappis so that people who maybe are a bit
[27:17.400 --> 27:24.120]  less confident could learn more about it? Sadly, not right now. Maybe that's something if enough
[27:24.120 --> 27:30.680]  people are interested. That's something that we could certainly look into. But we have the
[27:30.680 --> 27:39.080]  discussions in GitHub. So quite a lot of people ask questions there. So there are also frequently
[27:39.080 --> 27:47.800]  asked questions there. And we have also a Zulip chat. Also we are on Libera, but right now not
[27:47.800 --> 27:55.480]  so many people are there. And yeah, so just drop by and ask whatever you want.
[27:57.000 --> 28:05.160]  Super, thank you. So we have about 45 seconds left. That's time to squeeze in one last question.
[28:05.160 --> 28:09.880]  So we have one here from Paul. And we've got some love for the timeline, which I agree. I was like,
[28:09.880 --> 28:17.880]  oh, I want that. Do you plan any other visualizations, like maybe publication networks from citation data?
[28:19.240 --> 28:24.600]  Yes, actually, yes, because I realized I really like these visualizations.
[28:26.680 --> 28:33.000]  I plan some like with the citation, some trees and stuff like this. But I would like to have
[28:33.000 --> 28:38.120]  more feedback from from users to really know what's really sensible and useful.
[28:40.360 --> 28:45.640]  Thanks. That's a great point. So we have three seconds left. Thank you so much.
[28:45.640 --> 28:50.600]  Thank you. Thank you, all of you. All right, I think we're off the live stream. That was
[28:51.400 --> 28:57.320]  I'm so going to be going back and like setting my own purpose up with a web server. Thank you so much.
[28:57.320 --> 29:02.040]  Yeah, thank you. Thank you. Okay, I'm going to hop to the next talk.