[00:00.000 --> 00:07.560] Thank you. [00:07.560 --> 00:12.520] We start with a small introduction to have a bit of context about Djubaco. [00:12.520 --> 00:19.520] I'm Mathieu Goetje, I'm a freelance developer and I'm working, my main client is the Qwix [00:19.520 --> 00:25.680] project and for there I'm the lead developer of Flimzim. [00:25.680 --> 00:26.680] What is Qwix? [00:26.680 --> 00:33.520] Qwix is a project to provide content where internet is not there and the question we [00:33.520 --> 00:38.600] try to answer and we have answered is how to distribute static websites. [00:38.600 --> 00:46.600] And for example, if you don't know all Wikipedia in English, it's 95 gigabytes and it's 6.5 [00:46.600 --> 00:49.760] billion articles and media. [00:49.760 --> 00:57.840] And to do that we use the Zim format. It's an archive format for web content and content [00:57.840 --> 01:03.640] is partially compressed so you can compress textual content or not compress images or [01:03.640 --> 01:09.680] videos and you can do a random access without initial decompression so you can access the [01:09.680 --> 01:12.560] content inside the archive directly. [01:12.560 --> 01:18.880] It works well and pretty efficient but there is a few flaws within the design and the archive [01:18.880 --> 01:26.680] is really tied to web contents and to Qwix and you cannot add another metadata but the [01:26.680 --> 01:32.680] question I tried to answer is could we reuse the Zim format, the good idea of the Zim format [01:32.680 --> 01:36.400] and do better and more generic. [01:36.400 --> 01:37.400] So here is Djubaco. [01:37.400 --> 01:44.560] Djubaco is a Japanese name for the bento boxes and it's more boxes you can compose the way [01:44.560 --> 01:47.680] you want depending on your needs. [01:47.680 --> 01:53.920] And Djubaco is a new format independent of Qwix project and this is a good idea of the [01:53.920 --> 01:57.560] Zim format but generic. [01:57.560 --> 01:59.640] And Djubaco is a meta container. [01:59.640 --> 02:05.600] It tells you how to store things but it's up to you to decide what do you want to store [02:05.600 --> 02:14.120] and how do you want to organize them and there is a reference library written in Rust. [02:14.120 --> 02:20.040] The feature of Djubaco, it's mainly read only, archive are mainly read only, this is selective [02:20.040 --> 02:24.280] compression so you can compress the content or not. [02:24.280 --> 02:29.280] No initial decompression needed and you can do random access on the archive. [02:29.280 --> 02:35.280] It's configurable so you can decide which property you want on the entries. [02:35.280 --> 02:42.720] There is an extension system so your user can download an archive and they can download [02:42.720 --> 02:48.440] extra content to add content to the archive you provide. [02:48.440 --> 02:53.680] It's embeddable in another file and it's composable so you can compose different kind of entry [02:53.680 --> 02:55.360] together in the same container. [02:55.360 --> 03:03.720] So it checks them and a few features to do, signature and encryption, direct access to [03:03.720 --> 03:10.080] uncompressed content, content deduplication, modification, different patch between archive [03:10.080 --> 03:13.840] and overlay. [03:13.840 --> 03:19.240] Let's have a quick tour on the internal structure. [03:19.240 --> 03:25.840] The Djubaco containers are organized around packs. [03:25.840 --> 03:29.880] There is three kinds of packs, manifest packs, content and the directory. [03:29.880 --> 03:34.400] Each pack can be stored individually in a file in the file system or they can be put [03:34.400 --> 03:41.240] together in one file and then you distribute this file to your user. [03:41.240 --> 03:46.480] The manifest pack is the main pack, this is a pack you will try to open when you want [03:46.480 --> 03:54.040] to open a Djubaco container and it's mainly a list of all the other packs of the container. [03:54.040 --> 03:59.520] The content pack is a pack which contains the raw content, compressed or not, and without [03:59.520 --> 04:02.280] any metadata. [04:02.280 --> 04:08.680] The directory pack is where you store the entries and the entries can print to contents [04:08.680 --> 04:10.880] in the content pack. [04:10.880 --> 04:18.720] This is a configurable part of Djubaco and inside the directory pack there is entries [04:18.720 --> 04:26.080] with a specific schema so you have to define the schema and the schema is the series of [04:26.080 --> 04:28.600] properties and their types. [04:28.600 --> 04:32.840] The content is just a property, it's a link to the content in the content pack so you [04:32.840 --> 04:41.280] can have entries at that point to several contents or no contents at all and each entry [04:41.280 --> 04:51.600] schema can contain violence, it's kind of union or enum in Proclamation EC or REST and [04:51.600 --> 04:57.840] you can have different kind of entries inside one directory pack. [04:57.840 --> 05:02.720] Each use case, why you would like to use Djubaco? [05:02.720 --> 05:09.440] The first use case is file archive, there is two arcs which is an equivalent of tar [05:09.440 --> 05:18.160] and here we have one kind of entry with three variants, file, symlink and directory, all [05:18.160 --> 05:24.920] three variants share two common property and for example the file variants add the pointer [05:24.920 --> 05:34.960] to a contents, symlink and the directory just store the first and pointer to the first entry [05:34.960 --> 05:37.880] and the number of entries in the directory. [05:37.880 --> 05:46.440] So it's kind of an organization and three structure as a file system. [05:46.440 --> 05:52.120] There is no index property for now but just mainly because arcs is pretty young and I [05:52.120 --> 05:58.440] don't want to bother with them while testing arcs and Djubaco but it's hard. [05:58.440 --> 06:06.480] It's a file archive so we can compare a bit arcs with tar to see how Djubaco and arcs [06:06.480 --> 06:07.480] perform. [06:07.480 --> 06:16.000] If we take the Linux source code, the full Linux source code is more than one gigabyte [06:16.000 --> 06:26.400] and both our tar and arcs are compressing the source code is about 130 or 14 megabytes. [06:26.400 --> 06:34.160] Crescent time arcs is a bit faster than tar and expression time we are pretty close arcs [06:34.160 --> 06:40.760] is a bit slower but we have someone pretty close, both tools are pretty close. [06:40.760 --> 06:49.880] What is interesting is when we try to list the contents of the archive, tar took almost [06:49.880 --> 06:57.480] the same time that expression because to list the contents in the tar archive you need to [06:57.480 --> 07:04.200] uncompress all the contents and arcs is very faster because the list of the entries are [07:04.200 --> 07:08.520] separated from the contents itself. [07:08.520 --> 07:16.160] If you want to extract only one content from the archive and we try to, what's that called [07:16.160 --> 07:23.160] dumping and when you try to dump a third of all the entries, you can see that arcs is [07:23.160 --> 07:31.480] really really really faster and the same way extracting one entry from the tar is pretty [07:31.480 --> 07:39.320] close from the time of listing the contents the same way as you need to uncompress all [07:39.320 --> 07:45.880] the contents of the tar archive and arcs you can locate the content and do a direct access [07:45.880 --> 07:51.200] to the content without uncompressing other contents. [07:51.200 --> 07:59.360] Once you think that we can do that is mount the archive, directly mount the archive on [07:59.360 --> 08:07.120] the file system and if you mount the archive and you do a diff of the content between the [08:07.120 --> 08:15.400] original source and what is mounted, if you do a diff between two plain directories it's [08:15.400 --> 08:23.400] a bit less than a second with arcs it's four seconds and half and tar is an estimation [08:23.400 --> 08:31.840] it will take something like ten hours to do the comparison. [08:31.840 --> 08:37.280] You can do something even more interesting with a mounted file system or with a mounting [08:37.280 --> 08:45.400] Linux source is compiling the kernel so if you compile the kernel on the plain file system [08:45.400 --> 08:52.360] it's a bit more than half an hour and if you compile the kernel using the mounted arcs [08:52.360 --> 08:56.520] archive it's a bit less than an hour. [08:56.520 --> 09:03.880] What is interesting here is that the compilation is made with G8 so there is eight processes [09:03.880 --> 09:10.320] and arcs a fuel file system is monostated so there is a huge bottleneck for now but [09:10.320 --> 09:19.720] if we move to a multi-threaded fuel file system it should be even better. [09:19.720 --> 09:26.360] The use case is the GIM it's an equivalent of kind of equivalent of ZIM format there [09:26.360 --> 09:33.200] is two variants only and here we are storing the entries as a plain list and there is no [09:33.200 --> 09:42.920] tree structure and the GIM binary just integrates a small HTTP server looking for the entries. [09:42.920 --> 09:52.560] What we can do also is we could replace for example RPM and DEB with arcs or things based [09:52.560 --> 09:58.840] on jubacca so you could download your package and not extract it from the file system just [09:58.840 --> 10:05.560] open it directly and even a GVL or debugging fault that could be put in specific content [10:05.560 --> 10:13.520] pack of the same archive so it could simplify the management and you will not need to have [10:13.520 --> 10:19.640] different package to different sub-type of contents of your packages. [10:19.640 --> 10:27.120] OCI containers are based on Tor you need to extract them on the file system before running [10:27.120 --> 10:34.680] a container so you could just use arcs among them or you can even use directly put different [10:34.680 --> 10:45.160] layer in different content pack and so the wall images will be one jubacca container. [10:45.160 --> 10:56.160] File format almost all file formats are in fact container for other content so you could [10:56.160 --> 11:03.560] use jubacca to just organize the content you want to store what you want for your own project [11:03.560 --> 11:07.720] and your own file format. [11:07.720 --> 11:15.840] Websites jubacca is written in rest you could run it in wasm and so jubacca could run you [11:15.840 --> 11:24.080] could load your jubacca archive in the browser once and just open it directly in the browser. [11:24.080 --> 11:30.760] Backups backup jubacca is almost incremental by design if you reuse the content pack of [11:30.760 --> 11:36.760] the backups previous backup this is incremental and you can decide which property you want [11:36.760 --> 11:44.760] to have so for example you can add a checksum on each sentries to do a comparison between [11:44.760 --> 11:50.280] the content store in the backup and what you have on the file system. [11:50.280 --> 11:59.680] Embedding resources jubacca can be embedded in executor programs or even more this presentation [11:59.680 --> 12:05.360] you can download this presentation at this address and you will have a file and this [12:05.360 --> 12:11.680] file is an arched archive so you can just use the arch tool to list the content extract [12:11.680 --> 12:16.920] or month archive and you will have access to all the file of this presentation it is [12:16.920 --> 12:24.960] revealed yes and it's HTML content but the same file is also a gym archive so you can [12:24.960 --> 12:34.520] just use the gym tool to just set the content and open a browser to the local host and the [12:34.520 --> 12:41.440] same files is also a program so if you make it executable you can run the program itself [12:41.440 --> 12:49.120] to month extract or set the content what is interesting is that between our the content [12:49.120 --> 12:55.840] is not shared it is an arch and gym archive but it's just a view to the same content there [12:55.840 --> 13:01.520] is no duplication it's not two archive put together it's really one archive with two [13:01.520 --> 13:08.880] kind of view of the same content and the last line is the exact command used to serve this [13:08.880 --> 13:18.360] actual presentation conclusion this is a new way of thinking [13:18.360 --> 13:26.800] we could extract you could use archive directly instead of extracting it so we can reinvent [13:26.800 --> 13:38.200] the wall without thinking about using directly the archive it's a new way of thinking it's [13:38.200 --> 13:45.600] generic it's a command based tool that can add that can add that to different usage [13:45.600 --> 13:54.640] but it's pretty new maybe some crash and you can expect maybe some change in the specification [13:54.640 --> 14:23.880] thank you are there any questions can you repeat the question okay I don't know [14:23.880 --> 14:36.680] I know about squash fs but the thing is that jubaco is not a file system arcs is an archive [14:36.680 --> 14:42.840] to store files but jubaco is not so jubaco is more generic than crime fs or squash fs [14:42.840 --> 15:02.200] probably and arcs compared to squash fs is is is arcs slower than squash fs on size arcs [15:02.200 --> 15:18.080] is better but on the performance is slower we could implement that in other languages [15:18.080 --> 15:27.640] yeah could we re-implement this in other languages you could there is the specification [15:27.640 --> 15:34.120] is language and mystic but just I just implement reference library in rest battle but the specification [15:34.120 --> 15:49.600] is is public sorry that zip is pretty small but zip is is a slower the arcs in almost [15:49.600 --> 16:01.440] any any kind of operation and is bigger than arcs also thank you