[00:00.000 --> 00:07.560]  Thank you.
[00:07.560 --> 00:12.520]  We start with a small introduction to have a bit of context about Djubaco.
[00:12.520 --> 00:19.520]  I'm Mathieu Goetje, I'm a freelance developer and I'm working, my main client is the Qwix
[00:19.520 --> 00:25.680]  project and for there I'm the lead developer of Flimzim.
[00:25.680 --> 00:26.680]  What is Qwix?
[00:26.680 --> 00:33.520]  Qwix is a project to provide content where internet is not there and the question we
[00:33.520 --> 00:38.600]  try to answer and we have answered is how to distribute static websites.
[00:38.600 --> 00:46.600]  And for example, if you don't know all Wikipedia in English, it's 95 gigabytes and it's 6.5
[00:46.600 --> 00:49.760]  billion articles and media.
[00:49.760 --> 00:57.840]  And to do that we use the Zim format. It's an archive format for web content and content
[00:57.840 --> 01:03.640]  is partially compressed so you can compress textual content or not compress images or
[01:03.640 --> 01:09.680]  videos and you can do a random access without initial decompression so you can access the
[01:09.680 --> 01:12.560]  content inside the archive directly.
[01:12.560 --> 01:18.880]  It works well and pretty efficient but there is a few flaws within the design and the archive
[01:18.880 --> 01:26.680]  is really tied to web contents and to Qwix and you cannot add another metadata but the
[01:26.680 --> 01:32.680]  question I tried to answer is could we reuse the Zim format, the good idea of the Zim format
[01:32.680 --> 01:36.400]  and do better and more generic.
[01:36.400 --> 01:37.400]  So here is Djubaco.
[01:37.400 --> 01:44.560]  Djubaco is a Japanese name for the bento boxes and it's more boxes you can compose the way
[01:44.560 --> 01:47.680]  you want depending on your needs.
[01:47.680 --> 01:53.920]  And Djubaco is a new format independent of Qwix project and this is a good idea of the
[01:53.920 --> 01:57.560]  Zim format but generic.
[01:57.560 --> 01:59.640]  And Djubaco is a meta container.
[01:59.640 --> 02:05.600]  It tells you how to store things but it's up to you to decide what do you want to store
[02:05.600 --> 02:14.120]  and how do you want to organize them and there is a reference library written in Rust.
[02:14.120 --> 02:20.040]  The feature of Djubaco, it's mainly read only, archive are mainly read only, this is selective
[02:20.040 --> 02:24.280]  compression so you can compress the content or not.
[02:24.280 --> 02:29.280]  No initial decompression needed and you can do random access on the archive.
[02:29.280 --> 02:35.280]  It's configurable so you can decide which property you want on the entries.
[02:35.280 --> 02:42.720]  There is an extension system so your user can download an archive and they can download
[02:42.720 --> 02:48.440]  extra content to add content to the archive you provide.
[02:48.440 --> 02:53.680]  It's embeddable in another file and it's composable so you can compose different kind of entry
[02:53.680 --> 02:55.360]  together in the same container.
[02:55.360 --> 03:03.720]  So it checks them and a few features to do, signature and encryption, direct access to
[03:03.720 --> 03:10.080]  uncompressed content, content deduplication, modification, different patch between archive
[03:10.080 --> 03:13.840]  and overlay.
[03:13.840 --> 03:19.240]  Let's have a quick tour on the internal structure.
[03:19.240 --> 03:25.840]  The Djubaco containers are organized around packs.
[03:25.840 --> 03:29.880]  There is three kinds of packs, manifest packs, content and the directory.
[03:29.880 --> 03:34.400]  Each pack can be stored individually in a file in the file system or they can be put
[03:34.400 --> 03:41.240]  together in one file and then you distribute this file to your user.
[03:41.240 --> 03:46.480]  The manifest pack is the main pack, this is a pack you will try to open when you want
[03:46.480 --> 03:54.040]  to open a Djubaco container and it's mainly a list of all the other packs of the container.
[03:54.040 --> 03:59.520]  The content pack is a pack which contains the raw content, compressed or not, and without
[03:59.520 --> 04:02.280]  any metadata.
[04:02.280 --> 04:08.680]  The directory pack is where you store the entries and the entries can print to contents
[04:08.680 --> 04:10.880]  in the content pack.
[04:10.880 --> 04:18.720]  This is a configurable part of Djubaco and inside the directory pack there is entries
[04:18.720 --> 04:26.080]  with a specific schema so you have to define the schema and the schema is the series of
[04:26.080 --> 04:28.600]  properties and their types.
[04:28.600 --> 04:32.840]  The content is just a property, it's a link to the content in the content pack so you
[04:32.840 --> 04:41.280]  can have entries at that point to several contents or no contents at all and each entry
[04:41.280 --> 04:51.600]  schema can contain violence, it's kind of union or enum in Proclamation EC or REST and
[04:51.600 --> 04:57.840]  you can have different kind of entries inside one directory pack.
[04:57.840 --> 05:02.720]  Each use case, why you would like to use Djubaco?
[05:02.720 --> 05:09.440]  The first use case is file archive, there is two arcs which is an equivalent of tar
[05:09.440 --> 05:18.160]  and here we have one kind of entry with three variants, file, symlink and directory, all
[05:18.160 --> 05:24.920]  three variants share two common property and for example the file variants add the pointer
[05:24.920 --> 05:34.960]  to a contents, symlink and the directory just store the first and pointer to the first entry
[05:34.960 --> 05:37.880]  and the number of entries in the directory.
[05:37.880 --> 05:46.440]  So it's kind of an organization and three structure as a file system.
[05:46.440 --> 05:52.120]  There is no index property for now but just mainly because arcs is pretty young and I
[05:52.120 --> 05:58.440]  don't want to bother with them while testing arcs and Djubaco but it's hard.
[05:58.440 --> 06:06.480]  It's a file archive so we can compare a bit arcs with tar to see how Djubaco and arcs
[06:06.480 --> 06:07.480]  perform.
[06:07.480 --> 06:16.000]  If we take the Linux source code, the full Linux source code is more than one gigabyte
[06:16.000 --> 06:26.400]  and both our tar and arcs are compressing the source code is about 130 or 14 megabytes.
[06:26.400 --> 06:34.160]  Crescent time arcs is a bit faster than tar and expression time we are pretty close arcs
[06:34.160 --> 06:40.760]  is a bit slower but we have someone pretty close, both tools are pretty close.
[06:40.760 --> 06:49.880]  What is interesting is when we try to list the contents of the archive, tar took almost
[06:49.880 --> 06:57.480]  the same time that expression because to list the contents in the tar archive you need to
[06:57.480 --> 07:04.200]  uncompress all the contents and arcs is very faster because the list of the entries are
[07:04.200 --> 07:08.520]  separated from the contents itself.
[07:08.520 --> 07:16.160]  If you want to extract only one content from the archive and we try to, what's that called
[07:16.160 --> 07:23.160]  dumping and when you try to dump a third of all the entries, you can see that arcs is
[07:23.160 --> 07:31.480]  really really really faster and the same way extracting one entry from the tar is pretty
[07:31.480 --> 07:39.320]  close from the time of listing the contents the same way as you need to uncompress all
[07:39.320 --> 07:45.880]  the contents of the tar archive and arcs you can locate the content and do a direct access
[07:45.880 --> 07:51.200]  to the content without uncompressing other contents.
[07:51.200 --> 07:59.360]  Once you think that we can do that is mount the archive, directly mount the archive on
[07:59.360 --> 08:07.120]  the file system and if you mount the archive and you do a diff of the content between the
[08:07.120 --> 08:15.400]  original source and what is mounted, if you do a diff between two plain directories it's
[08:15.400 --> 08:23.400]  a bit less than a second with arcs it's four seconds and half and tar is an estimation
[08:23.400 --> 08:31.840]  it will take something like ten hours to do the comparison.
[08:31.840 --> 08:37.280]  You can do something even more interesting with a mounted file system or with a mounting
[08:37.280 --> 08:45.400]  Linux source is compiling the kernel so if you compile the kernel on the plain file system
[08:45.400 --> 08:52.360]  it's a bit more than half an hour and if you compile the kernel using the mounted arcs
[08:52.360 --> 08:56.520]  archive it's a bit less than an hour.
[08:56.520 --> 09:03.880]  What is interesting here is that the compilation is made with G8 so there is eight processes
[09:03.880 --> 09:10.320]  and arcs a fuel file system is monostated so there is a huge bottleneck for now but
[09:10.320 --> 09:19.720]  if we move to a multi-threaded fuel file system it should be even better.
[09:19.720 --> 09:26.360]  The use case is the GIM it's an equivalent of kind of equivalent of ZIM format there
[09:26.360 --> 09:33.200]  is two variants only and here we are storing the entries as a plain list and there is no
[09:33.200 --> 09:42.920]  tree structure and the GIM binary just integrates a small HTTP server looking for the entries.
[09:42.920 --> 09:52.560]  What we can do also is we could replace for example RPM and DEB with arcs or things based
[09:52.560 --> 09:58.840]  on jubacca so you could download your package and not extract it from the file system just
[09:58.840 --> 10:05.560]  open it directly and even a GVL or debugging fault that could be put in specific content
[10:05.560 --> 10:13.520]  pack of the same archive so it could simplify the management and you will not need to have
[10:13.520 --> 10:19.640]  different package to different sub-type of contents of your packages.
[10:19.640 --> 10:27.120]  OCI containers are based on Tor you need to extract them on the file system before running
[10:27.120 --> 10:34.680]  a container so you could just use arcs among them or you can even use directly put different
[10:34.680 --> 10:45.160]  layer in different content pack and so the wall images will be one jubacca container.
[10:45.160 --> 10:56.160]  File format almost all file formats are in fact container for other content so you could
[10:56.160 --> 11:03.560]  use jubacca to just organize the content you want to store what you want for your own project
[11:03.560 --> 11:07.720]  and your own file format.
[11:07.720 --> 11:15.840]  Websites jubacca is written in rest you could run it in wasm and so jubacca could run you
[11:15.840 --> 11:24.080]  could load your jubacca archive in the browser once and just open it directly in the browser.
[11:24.080 --> 11:30.760]  Backups backup jubacca is almost incremental by design if you reuse the content pack of
[11:30.760 --> 11:36.760]  the backups previous backup this is incremental and you can decide which property you want
[11:36.760 --> 11:44.760]  to have so for example you can add a checksum on each sentries to do a comparison between
[11:44.760 --> 11:50.280]  the content store in the backup and what you have on the file system.
[11:50.280 --> 11:59.680]  Embedding resources jubacca can be embedded in executor programs or even more this presentation
[11:59.680 --> 12:05.360]  you can download this presentation at this address and you will have a file and this
[12:05.360 --> 12:11.680]  file is an arched archive so you can just use the arch tool to list the content extract
[12:11.680 --> 12:16.920]  or month archive and you will have access to all the file of this presentation it is
[12:16.920 --> 12:24.960]  revealed yes and it's HTML content but the same file is also a gym archive so you can
[12:24.960 --> 12:34.520]  just use the gym tool to just set the content and open a browser to the local host and the
[12:34.520 --> 12:41.440]  same files is also a program so if you make it executable you can run the program itself
[12:41.440 --> 12:49.120]  to month extract or set the content what is interesting is that between our the content
[12:49.120 --> 12:55.840]  is not shared it is an arch and gym archive but it's just a view to the same content there
[12:55.840 --> 13:01.520]  is no duplication it's not two archive put together it's really one archive with two
[13:01.520 --> 13:08.880]  kind of view of the same content and the last line is the exact command used to serve this
[13:08.880 --> 13:18.360]  actual presentation conclusion this is a new way of thinking
[13:18.360 --> 13:26.800]  we could extract you could use archive directly instead of extracting it so we can reinvent
[13:26.800 --> 13:38.200]  the wall without thinking about using directly the archive it's a new way of thinking it's
[13:38.200 --> 13:45.600]  generic it's a command based tool that can add that can add that to different usage
[13:45.600 --> 13:54.640]  but it's pretty new maybe some crash and you can expect maybe some change in the specification
[13:54.640 --> 14:23.880]  thank you are there any questions can you repeat the question okay I don't know
[14:23.880 --> 14:36.680]  I know about squash fs but the thing is that jubaco is not a file system arcs is an archive
[14:36.680 --> 14:42.840]  to store files but jubaco is not so jubaco is more generic than crime fs or squash fs
[14:42.840 --> 15:02.200]  probably and arcs compared to squash fs is is is arcs slower than squash fs on size arcs
[15:02.200 --> 15:18.080]  is better but on the performance is slower we could implement that in other languages
[15:18.080 --> 15:27.640]  yeah could we re-implement this in other languages you could there is the specification
[15:27.640 --> 15:34.120]  is language and mystic but just I just implement reference library in rest battle but the specification
[15:34.120 --> 15:49.600]  is is public sorry that zip is pretty small but zip is is a slower the arcs in almost
[15:49.600 --> 16:01.440]  any any kind of operation and is bigger than arcs also thank you