[00:00.000 --> 00:07.480] So, great to see you all. [00:07.480 --> 00:08.480] So many people here. [00:08.480 --> 00:09.480] That's awesome. [00:09.480 --> 00:10.480] Welcome to my talk. [00:10.480 --> 00:12.520] It's called Decentralized Search with IPFS. [00:12.520 --> 00:16.080] Maybe first of all, like a quick pause. [00:16.080 --> 00:18.400] How many of you have used IPFS? [00:18.400 --> 00:20.080] Please raise your hand. [00:20.080 --> 00:21.080] Okay. [00:21.080 --> 00:22.080] Okay, nice. [00:22.080 --> 00:23.920] And how many of you have heard about IPFS? [00:23.920 --> 00:25.160] Okay, all of you. [00:25.160 --> 00:26.160] Okay, cool. [00:26.160 --> 00:28.640] So, you know all about it already, no? [00:28.640 --> 00:31.560] Yeah, so the talk is called How Does It Work Under the Hood? [00:31.560 --> 00:35.480] So we will dive in, yeah, pretty deep at some points of the talk. [00:35.480 --> 00:38.120] But yeah, first things first. [00:38.120 --> 00:39.120] My name is Dennis. [00:39.120 --> 00:41.480] I'm a research engineer at Protocol Labs. [00:41.480 --> 00:46.000] I'm working in a team called PROBLAB and we're doing network measurements and protocol [00:46.000 --> 00:47.400] optimizations there. [00:47.400 --> 00:51.960] I'm also an industrial PhD candidate at the University of Göttingen and you can reach [00:51.960 --> 00:53.760] me on all these handles on the internet. [00:53.760 --> 00:58.000] So, if you have any questions, you can reach out and let me know your questions or just [00:58.000 --> 01:00.280] hear the venue after the talk. [01:00.280 --> 01:01.760] So what's in for you today? [01:01.760 --> 01:06.280] First of all, just in words and numbers, what is the IPFS? [01:06.280 --> 01:08.520] Just general overview. [01:08.520 --> 01:12.960] And at that point, after we covered that, I would just assume we have installed a local [01:12.960 --> 01:18.840] IPFS node on your computer and I will walk you through the different commands from, yeah, [01:18.840 --> 01:22.800] we are initializing some of the repository, we are publishing content to the network and [01:22.800 --> 01:26.440] so on and we'll explain what happens in each of these steps so that all of you hopefully [01:26.440 --> 01:30.280] get a glimpse on what's going on under the hood. [01:30.280 --> 01:33.800] So we are importing content, we connect to the network, I explain content routing, this [01:33.800 --> 01:39.200] is the very technical part and at the end some call-alls basically. [01:39.200 --> 01:40.520] So what is IPFS? [01:40.520 --> 01:46.840] IPFS stands for the Interplanetary File System and generally it's a decentralized storage [01:46.840 --> 01:51.760] and delivery network which builds on peer-to-peer networking and content-based addressing. [01:51.760 --> 01:56.440] So peer-to-peer networking, if you have followed along or if you have been here earlier today, [01:56.440 --> 02:03.400] Max gave a great talk about IPTP, about connectivity in general in peer-to-peer networks and IPFS [02:03.400 --> 02:08.520] is one of the main users of the IPTP library and builds on top of that. [02:08.520 --> 02:13.120] And most importantly, it's very tiny at the bottom, IPFS is not a blockchain, so also [02:13.120 --> 02:17.760] a common misconception, I'd like to emphasize that. [02:17.760 --> 02:23.480] N numbers, given these numbers are from mid last year, so probably in need of an update [02:23.480 --> 02:26.800] but this operation is since 2015, that hasn't changed. [02:26.800 --> 02:32.280] Numbers of requests exceed a billion in a week and hundreds of terabytes of traffic that [02:32.280 --> 02:38.000] we see and tens of millions of active users also weekly but it is a disclaimer, this is [02:38.000 --> 02:42.000] just from our vantage point, in a decentralized network no one has a complete view of what's [02:42.000 --> 02:50.160] going on, so these numbers could be much higher or just different in general. [02:50.160 --> 02:56.120] On ecosystem.ipfs.tech you can find some companies that build on top of this tech and it's all [02:56.120 --> 03:04.560] in these different areas, social media and so on and so forth, so worth looking up. [03:04.560 --> 03:06.960] What's the value proposition of IPFS? [03:06.960 --> 03:12.280] The most important thing that it does, it decouples the content from its host and it does this through [03:12.280 --> 03:18.720] a concept that's called content addressing and content addresses are just our permanent [03:18.720 --> 03:24.880] verifiable links and this allows you to request content with this or request data with that [03:24.880 --> 03:28.680] content address and anyone can serve you the content and just from the address that you [03:28.680 --> 03:34.240] asked with you can identify and verify that the content you got served is actually the [03:34.240 --> 03:40.560] one that you requested and you are not dependent on the authenticity of the host as it's the [03:40.560 --> 03:42.760] case with HTTP. [03:42.760 --> 03:46.640] Because it's a decentralized network, it's also censorship resistant and I like to put [03:46.640 --> 03:50.000] here that it alleviates backbone addiction, so what do I mean with that? [03:50.000 --> 03:54.800] Let's imagine all of you or all of us wanted to download a 100 megabyte YouTube video here [03:54.800 --> 03:59.080] in this room, we would put pressure, so if we were 100 people we would put pressure off [03:59.080 --> 04:04.840] about 10 gigabytes onto the backbone to just download the video into this room, wouldn't [04:04.840 --> 04:08.560] it be better if we could just download it once and distribute it across each other or [04:08.560 --> 04:12.840] download different parts and be a little bit more clever about that. [04:12.840 --> 04:18.520] In the similar vein, if we were working on a Google doc here inside this room, why does [04:18.520 --> 04:22.840] it stop working if we don't have internet connection anymore? [04:22.840 --> 04:25.960] It actually should work, it's actually ridiculous. [04:25.960 --> 04:31.720] And also, for some to the same category, this partition tolerance for emerging networks [04:31.720 --> 04:38.280] could also become very important or if you're just in a patchy coffee shop Wi-Fi. [04:38.280 --> 04:42.080] Alright, so how can you install IPFS? [04:42.080 --> 04:46.960] So there, I put down three different ways here, so IPFS in general is not, you don't [04:46.960 --> 04:53.200] install IPFS, IPFS is more specification and there are different implementations of this [04:53.200 --> 04:57.680] specification and the most common one is Kubo, which was formerly known as Go IPFS, [04:57.680 --> 05:02.240] so it's a Go implementation, there's a new one called IRO, which is in Rust and I think [05:02.240 --> 05:07.840] the newest one is in JavaScript called Helia, yeah, I think that's also the newest kid on [05:07.840 --> 05:14.400] the block and so I will talk about Kubo here and the easiest thing to get started is just [05:14.400 --> 05:18.800] download IPFS desktop, which is an electron app that bundles an IPFS node, gives you a [05:18.800 --> 05:24.600] nice UI and you can already interact and request CIDs from the network and so on. [05:24.600 --> 05:28.320] Then there's the IPFS companion, which is a browser extension that you can install to [05:28.320 --> 05:34.240] Firefox or your browser of choice or you directly use Brave or Opera, which comes in with a [05:34.240 --> 05:40.080] bundled IPFS node already, so if you enter a IPFS colon slash slash and a CID, it will [05:40.080 --> 05:43.880] resolve the content through the IPFS network. [05:43.880 --> 05:46.440] But as I said in the beginning, in this talk, we will focus on the command line because [05:46.440 --> 05:51.480] we're in a developer conference and I will also assume that we run Kubo, which is the [05:51.480 --> 05:53.880] reference implementation basically. [05:53.880 --> 06:02.360] So now we have downloaded Kubo from github.com slash IPFS slash Kubo and we want to import [06:02.360 --> 06:04.160] some content, we just want to get started. [06:04.160 --> 06:09.080] So we downloaded it and now we have this IPFS command on our machine and the first thing [06:09.080 --> 06:16.600] that we do is run IPFS in it and what this does is it generates a public parried key pair [06:16.600 --> 06:23.400] per default in ED25519 and it spits out this random string of characters, which is basically [06:23.400 --> 06:24.560] your public key. [06:24.560 --> 06:32.120] So formally it was just the hash of your public key, but now it's just encoded your public [06:32.120 --> 06:38.040] key in here and this is your PR identity, which will become important later on. [06:38.040 --> 06:43.720] And it also initializes your IPFS repository per default in your home directory under.ipfs. [06:43.720 --> 06:45.880] This is the location where it stores all the files. [06:45.880 --> 06:50.720] So if you interact with the IPFS network and request files, it stores it in this directory [06:50.720 --> 06:57.880] in a specific format similar to Git, how Git does the Git object store basically. [06:57.880 --> 07:01.240] And importantly, I will point this out a couple of times, this is just a local operation. [07:01.240 --> 07:05.320] So we haven't interacted with the network at all yet. [07:05.320 --> 07:09.560] So now we are ready to go, I have a file I want to add. [07:09.560 --> 07:16.440] So what I do is I run IPFS add and then my file name and in this case IPFS gives you [07:16.440 --> 07:21.080] like a progress bar or a Kubo gives you a progress bar and spits out again a random [07:21.080 --> 07:26.320] string of characters, which is the content identifier, the CID, which is the most fundamental [07:26.320 --> 07:27.320] ingredient here. [07:27.320 --> 07:32.080] And this is the part where it decouples the host, sorry, the content from its host. [07:32.080 --> 07:36.840] And as a mental model, you can think about the CID as a hash with some metadata. [07:36.840 --> 07:38.000] It's self-describing. [07:38.000 --> 07:41.120] So the metadata is this description part. [07:41.120 --> 07:43.360] You can see the ingredients at the bottom. [07:43.360 --> 07:47.960] So it's just an encoded version of some information like a CID version. [07:47.960 --> 07:54.440] So we have version zero and one and some other information that I won't go into right now. [07:54.440 --> 07:55.920] Then it's self-certifying. [07:55.920 --> 08:03.600] This is the point where if you request some data from the network, you certify the data [08:03.600 --> 08:09.280] that you could serve with the CID itself and not with the host that served you the content [08:09.280 --> 08:11.800] and just reiterating this. [08:11.800 --> 08:14.600] And it's an immutable identifier. [08:14.600 --> 08:19.000] And all these structures like the CID structure at the bottom and so on is governed by a project [08:19.000 --> 08:25.080] that's called multi-formats and it's also one of Prolucolab's projects here. [08:25.080 --> 08:31.080] And so the talk is called what happens under the hood, so what actually happened here. [08:31.080 --> 08:37.400] IPFS saw the file, which is just this white box here, a stream of bytes, and IPFS chunked [08:37.400 --> 08:38.400] it up. [08:38.400 --> 08:43.320] It's in different pieces, which is a common technique in networking, actually. [08:43.320 --> 08:46.880] And this gives us some nice properties. [08:46.880 --> 08:51.440] It allows us to do piecewise transfers so we can request blocks from different hosts, [08:51.440 --> 08:52.960] actually. [08:52.960 --> 08:54.760] And it allows for deduplication. [08:54.760 --> 09:01.200] Also if we have two blocks that are basically the same bytes, we can deduplicate that and [09:01.200 --> 09:04.080] save some storage space underneath. [09:04.080 --> 09:09.720] And also if the file was a video file, we also allow for random access so we could start [09:09.720 --> 09:16.720] in the middle of a video and don't need to stream all the previous bytes at all. [09:16.720 --> 09:22.160] And after we have chunked that up, what we do now or what IPFS does now is we need to [09:22.160 --> 09:25.440] put them, we need to put it together again. [09:25.440 --> 09:29.320] And what we do here is we hash each individual chunk. [09:29.320 --> 09:34.000] Each chunk gets its own CID, its own content identifier. [09:34.000 --> 09:40.360] Then the combination of each CID again gets another CID and we do this for both pairs [09:40.360 --> 09:41.840] at the bottom. [09:41.840 --> 09:48.640] And then the resulting common CIDs again will be put together yet again to generate the [09:48.640 --> 09:50.600] root CID, that's how we call it. [09:50.600 --> 09:53.920] And this is actually the CID that you see in the command line up there. [09:53.920 --> 10:00.480] So we took the chunks, put them, put the identifiers together to arrive at the final CID at the [10:00.480 --> 10:01.480] top. [10:01.480 --> 10:05.920] And this data structure is actually called a Merkle tree, but in IPFS land it's actually [10:05.920 --> 10:11.280] a Merkle deck because in Merkle trees your nodes are not allowed to have common parents. [10:11.280 --> 10:14.320] And the deck means here a directed acyclic graph. [10:14.320 --> 10:18.600] And let's imagine you didn't add a file but a directory. [10:18.600 --> 10:24.800] How do you encode the directory structure and not only the bytes and so on? [10:24.800 --> 10:30.280] All these formatting and serialization, deserialization things are governed by yet another project. [10:30.280 --> 10:34.640] It's called IPLD, which stands for Interplanetary Link Data. [10:34.640 --> 10:40.680] And IPLD does also a lot of more things, but for now this is specified in the scope of [10:40.680 --> 10:42.360] this project. [10:42.360 --> 10:46.840] So now we have imported the content. [10:46.840 --> 10:50.240] We have chunked it up, we've got the CID. [10:50.240 --> 10:53.640] But again, we haven't interacted with the network yet. [10:53.640 --> 10:58.040] So people think if you add something to IPFS you upload it somewhere and someone else takes [10:58.040 --> 11:01.320] care of hosting it for you, for free, which is not the case. [11:01.320 --> 11:03.600] So we added it to our local node. [11:03.600 --> 11:09.960] So now it ended up in this IPFS repository somewhere on our local machine. [11:09.960 --> 11:13.680] But only now we connect to the network and interact with it. [11:13.680 --> 11:21.160] For that we run IPFS daemon, which is a long-running process that connects to nodes in the network. [11:21.160 --> 11:24.520] We see some versioning information with which Go version was compiled with Kubo version [11:24.520 --> 11:26.560] we actually use. [11:26.560 --> 11:32.320] We see the addresses that the Kubo node listens on and also which ones are announced to the [11:32.320 --> 11:37.080] network, under which network addresses we are reachable. [11:37.080 --> 11:41.600] And then tells us that it started an API server, a web UI in the gateway. [11:41.600 --> 11:46.960] The API server is just an RPC API that is used by the command line to control the IPFS [11:46.960 --> 11:47.960] node. [11:47.960 --> 11:52.840] The web UI is the thing that you saw previously when you saw the screenshot of the IPFS desktop. [11:52.840 --> 11:58.240] So your local Kubo node also serves this web UI. [11:58.240 --> 11:59.240] And then the gateway. [11:59.240 --> 12:00.640] And the gateway is quite interesting. [12:00.640 --> 12:04.840] So this bridges the HTTP world with the IPFS world. [12:04.840 --> 12:09.280] So you can ask under this endpoint that you can see down there. [12:09.280 --> 12:17.040] If you put IPFS slash your CID inside the browser or in your SUD URL, the Kubo node [12:17.040 --> 12:20.760] will go ahead and resolve the CID in the network and serve it to you over HTTP. [12:20.760 --> 12:24.040] So this is like a bridge between both worlds. [12:24.040 --> 12:29.480] And ProCollapse and Cloudflare and so on are actually running these gateways on the internet [12:29.480 --> 12:35.240] right now, which you can use just a low barrier entry to the whole thing. [12:35.240 --> 12:37.200] And then the daemon is ready. [12:37.200 --> 12:41.280] And in this process, it has also connected to bootstrap nodes, which are hard coded to [12:41.280 --> 12:43.920] actually get to know other peers in the network. [12:43.920 --> 12:48.960] But you can also override it with your own bootstrap nodes. [12:48.960 --> 12:49.960] So now we are connected to the network. [12:49.960 --> 12:53.520] We have added our file to our own machine. [12:53.520 --> 12:59.160] But now the interesting or the problem or like the challenge, how do we actually find [12:59.160 --> 13:02.000] content hosts for a given CID? [13:02.000 --> 13:08.640] So I give my friend a CID, how does the node know that it needs to connect to me to request [13:08.640 --> 13:09.640] the content, actually? [13:09.640 --> 13:11.760] And I put here the solution is simple. [13:11.760 --> 13:12.760] We keep a mapping table. [13:12.760 --> 13:17.520] So we just have the CID mapped to the actual peer and every node has this on their machine. [13:17.520 --> 13:21.120] So everyone knows everything, basically. [13:21.120 --> 13:27.480] But as I said, the mapping table gets humongous, especially if we've split up those files into [13:27.480 --> 13:32.960] different chunks, and I think the default chunking size is 256 kilobytes. [13:32.960 --> 13:34.600] So we have just a lot of entries in this table. [13:34.600 --> 13:36.320] So this doesn't scale. [13:36.320 --> 13:40.560] So the solution would be to split this table, and each participating peer in this decentralized [13:40.560 --> 13:45.000] network holds a separate part of the table. [13:45.000 --> 13:46.960] But then we are back to square one. [13:46.960 --> 13:51.640] How do we know which peer holds which piece of this distributed hash table data? [13:51.640 --> 13:58.400] And the solution here would be to use a just deterministic distribution based on the Cademia [13:58.400 --> 13:59.400] DHT. [13:59.400 --> 14:04.560] Cademia is like a, is a, is a implementate or like a specific protocol for a distributed [14:04.560 --> 14:06.320] hash table. [14:06.320 --> 14:12.240] And at this point, I thought, so at this point, many talks on the internet about IPFS gloss [14:12.240 --> 14:14.320] over the DHT and how it works. [14:14.320 --> 14:19.160] And so when I got into this whole thing, I was lacking something. [14:19.160 --> 14:24.880] And so my experiment would be to just dive even a little deeper into, into this. [14:24.880 --> 14:30.000] And I would cover a bit of Cademia here, but at the end, this is very technical. [14:30.000 --> 14:34.680] But at the end, I would try to summarize everything so that everyone of you gets a little bit [14:34.680 --> 14:35.920] out of this. [14:35.920 --> 14:37.800] This whole process is called content routing. [14:37.800 --> 14:43.080] So this resolution of a CID to the content host. [14:43.080 --> 14:50.360] And IPFS uses an adaptation of the Cademia DHT by using a 256 bit key space. [14:50.360 --> 14:56.840] So we are hashing the CID and the PRID yet again with the SHA-256 to arrive in a common, [14:56.840 --> 14:58.760] in a common key space. [14:58.760 --> 15:03.120] And the distributed hash table in IPFS is just a distributed system that maps these keys [15:03.120 --> 15:04.120] to values. [15:04.120 --> 15:10.320] And the most important records here are provider records, which map a CID to a PRID. [15:10.320 --> 15:14.960] Some of the PRID is that what was generated when we initialize our node. [15:14.960 --> 15:21.760] And PRID and PR records, which then map the PRID to actually network addresses, like IP [15:21.760 --> 15:23.160] addresses and ports. [15:23.160 --> 15:27.840] So looking up a CID to a host for a CID is actually a two-step process. [15:27.840 --> 15:32.680] First we need to resolve the CID to a PRID, and then the PRID to their network addresses. [15:32.680 --> 15:35.040] And then we can connect to each other. [15:35.040 --> 15:41.680] And the distributed hash table here has two key features, first an X or distance metric. [15:41.680 --> 15:45.160] So that means we have some notion of closeness. [15:45.160 --> 15:49.960] So what this XOR thing does, so if I XOR two numbers together, the resulting number or [15:49.960 --> 15:54.880] this operation satisfies the condition, the requirements for a metric. [15:54.880 --> 16:02.360] So this means I can say a certain PRID is closer to a CID than some other PRID. [16:02.360 --> 16:07.680] So in this case, PRIDX could be closer to CID1 than PRIDY. [16:07.680 --> 16:17.040] And this allows us to basically sort CIDs with PRIDs together. [16:17.040 --> 16:18.800] And then this tree-based routing mechanism here. [16:18.800 --> 16:23.200] So in this bottom right diagram, I got this from the original paper, we have the black [16:23.200 --> 16:24.520] node. [16:24.520 --> 16:29.720] And with this tree-based routing, this is super clever as in each bubble, so all the [16:29.720 --> 16:35.040] PRID peers in the network can actually be considered as in a big try, a prefix try. [16:35.040 --> 16:41.040] And if we know only one PRID in each of these bubbles, we can guarantee that we can reach [16:41.040 --> 16:47.320] any other PRID in the network with O log N lookups by asking for even closer PRIDs based [16:47.320 --> 16:52.760] on this XOR routing mechanism here. [16:52.760 --> 16:56.160] So this was just abstractly what the distributed hash table in IPFS does. [16:56.160 --> 16:58.360] So how does it work concretely for IPFS? [16:58.360 --> 16:59.640] So we started the daemon process. [16:59.640 --> 17:04.960] What happened under the hood was we calculated the SHA-256 of our PRID, which just gives [17:04.960 --> 17:09.400] us a long string of bits and bytes, or just bits basically in our case. [17:09.400 --> 17:12.120] And we initialized a routing table at the bottom. [17:12.120 --> 17:14.880] And this routing table consists of different buckets. [17:14.880 --> 17:23.800] And each bucket is filled with peers that have a common prefix to our PRID, the hash [17:23.800 --> 17:26.160] from our PRID at the top. [17:26.160 --> 17:33.280] And when our node started up, we asked the bootstrap peers, hey, do you know anyone whose [17:33.280 --> 17:36.520] SHA-256 from PRID starts with a 1? [17:36.520 --> 17:42.320] And this means we have no common prefix, and we put them, those peers in bucket 0. [17:42.320 --> 17:47.440] Then we do the same for a prefix of 0, 0 and 0, 1, 1. [17:47.440 --> 17:52.560] And so we go through all the list until 255, and we fill up these buckets. [17:52.560 --> 17:56.240] And these are basically these buckets, these little blobs, these little circuits that you [17:56.240 --> 17:59.320] saw in the previous slide. [17:59.320 --> 18:00.880] And why did we do that? [18:00.880 --> 18:05.920] Because when we now want to retrieve content, so as I said, I handed the CID to my friend, [18:05.920 --> 18:12.000] and my friend enters the CID in the command line with this IPFS get command. [18:12.000 --> 18:17.120] Their node also calculates the SHA-256 of the CID, and then looks in its own routing [18:17.120 --> 18:20.680] table, sees, OK, I have a prefix of 2. [18:20.680 --> 18:26.040] I take one peer out of this bucket 2 and ask, yeah, locate the appropriate bucket, get the [18:26.040 --> 18:30.920] list of all peers, and then I asked all of these peers in the bucket, hey, do you know [18:30.920 --> 18:31.920] anyone? [18:31.920 --> 18:34.240] So first of all, do you know the provider record already? [18:34.240 --> 18:37.920] Do you know the CID and the PRID to that CID? [18:37.920 --> 18:42.640] And if yes, we are done, but if not, we are asking, do you know anyone closer based on [18:42.640 --> 18:43.640] this XR metric? [18:43.640 --> 18:47.240] And then this peer yet again looks in its own routing table, and so we get closer and [18:47.240 --> 18:54.720] closer and closer with this log n property that I showed you previously. [18:54.720 --> 18:57.880] And for publishing content, it's basically the same. [18:57.880 --> 19:03.360] We calculate the SHA-256 of the CID, locate the appropriate bucket, get a list of all [19:03.360 --> 19:10.600] the peers from that, and then we start parallel queries, but instead of asking for the provider [19:10.600 --> 19:13.040] record, we ask for even closer peers. [19:13.040 --> 19:21.320] And we terminate when the closest known peers in the query actually haven't replied with [19:21.320 --> 19:31.240] any peer that's closer, hasn't replied with anyone closer to the CID than we already know. [19:31.240 --> 19:36.480] And then we start the provider record with the 20 closest peers to that CID, and we do [19:36.480 --> 19:41.800] it with 20 because there's peer churn, so this is a permissionless network, and this [19:41.800 --> 19:47.320] means peers can come and go as they wish, and if we only started with one peer, we would [19:47.320 --> 19:53.600] risk that the provider record is not reachable when the node comes down, and in turn all [19:53.600 --> 19:57.520] content is not reachable. [19:57.520 --> 20:01.440] So this is like the very technical part of that, but let me summarize this. [20:01.440 --> 20:06.280] This is probably the easier way to understand all of this. [20:06.280 --> 20:11.080] First of all, so we added the content to our node, and so this is the file, enters the [20:11.080 --> 20:16.680] provider, the provider looks in its routing table, gets redirected to peer that is closer [20:16.680 --> 20:24.720] to the CID, and gets redirected until it finds the closest peer in this XR key space metric [20:24.720 --> 20:28.120] to the CID, and then it stores the provider record with that. [20:28.120 --> 20:33.680] Then off-band, the CID gets handed to the requester to my friend, and what I didn't say [20:33.680 --> 20:40.320] or told you yet, it's also IPFS maintains a long list or like, I don't know how many [20:40.320 --> 20:47.080] it is right now, probably a hundred or so, constant connections to other peers, and opportunistically [20:47.080 --> 20:52.960] just ask them, hey, do you know the CID or the provider record to the CID? [20:52.960 --> 20:58.320] And if this resolves, all good, we are done, but it's very unlikely for people to actually [20:58.320 --> 21:01.000] know a random CID. [21:01.000 --> 21:02.480] So let's assume this didn't work. [21:02.480 --> 21:07.160] So this requester also looks in its own routing table, gets redirected, gets redirected even [21:07.160 --> 21:17.880] closer, even closer to the peer ID of that CID, and then finds the peer that stores the [21:17.880 --> 21:24.440] provider record, fetches the provider record, then does again the same hops to find out [21:24.440 --> 21:28.560] the mapping from the peer ID to the network addresses, and then we can actually connect [21:28.560 --> 21:35.440] with each other and transfer the content, and we're done. [21:35.440 --> 21:42.200] So this is the content lifecycle, and this is actually, this is already it, well, already [21:42.200 --> 21:49.640] it is quite a bit, quite involved actually, and yeah, with that, it's already time for [21:49.640 --> 21:57.360] some callouts, get involved, IPFS is an open source project, if you're into measurements [21:57.360 --> 22:04.600] and so on, we have some grants open at radius.space, if you want to get involved with some network [22:04.600 --> 22:10.240] measurements, get your applications in, all action is in public, you can follow along [22:10.240 --> 22:16.800] our work, especially my work of our team, at this GitHub repository, we have plenty [22:16.800 --> 22:22.480] of requests for measurements that you can dive into, and extra ideas are always welcome. [22:22.480 --> 22:31.480] In general, IPFS is, I think, a very welcoming community, at least for me, and yeah, just, [22:31.480 --> 22:32.480] that's it. [22:32.480 --> 22:50.320] So, any questions? [22:50.320 --> 22:56.400] So is the way you describe it, using the DHT, how all nodes in the network share files with [22:56.400 --> 22:57.720] each other? [22:57.720 --> 23:04.960] There's one content routing mechanism, so there are multiple ones, so this first thing [23:04.960 --> 23:09.000] that I said here, so this opportunistic request to your immediate nodes is also some kind of [23:09.000 --> 23:13.920] content routing, so you're resolving the location of content, then there are some new efforts [23:13.920 --> 23:18.840] for building network indexes, which are just huge nodes that store the mappings, centralized [23:18.840 --> 23:28.880] nodes, which, like, federated centralized nodes, so not as bad, and I think, yeah, I think [23:28.880 --> 23:34.840] these are the important ones, basically, yeah, so there are more ways to resolve them. [23:34.840 --> 23:39.920] Also MDNS could also be one part, so if you're on the same network, you're broadcasting, [23:39.920 --> 23:49.200] I know, that's just for, sorry, for the local, yeah, okay, true, yeah, luckily we have a [23:49.200 --> 23:57.960] core maintainer of IPFS here, yeah, it's actually not a joke, but yeah, sorry, yeah. [23:57.960 --> 24:02.440] So I see that the provider records get replicated, but does the content actually get replicated [24:02.440 --> 24:04.120] across the network too? [24:04.120 --> 24:11.360] Yeah, so only if someone else chooses to, so you're publishing the provider record, [24:11.360 --> 24:18.520] so it's public somewhere, and anyone could look that up and also store the record themselves, [24:18.520 --> 24:25.280] so this is the idea, if content is popular and you care about the content being, staying [24:25.280 --> 24:32.400] alive in the network, it's called PIN, the CID, and this means you're fetching the content [24:32.400 --> 24:37.520] from this other provider and store it yourself and become the provider yourself, and because [24:37.520 --> 24:43.200] of the CID mechanism, which is self-certifying and so on, other peers that request the content [24:43.200 --> 24:50.280] from you don't even need to trust you, because the CID already encodes the trust chain here, [24:50.280 --> 24:54.120] but there's nothing that happened, it's not happening automatically here, so. [24:54.120 --> 24:56.880] But you can have multiple providers for the same company? [24:56.880 --> 24:59.400] Definitely, yeah, that's also, yeah, definitely, that's part of it. [24:59.400 --> 25:06.040] Another question is how does the project fit in, the concept of identity and trust and [25:06.040 --> 25:11.800] personas into IPFS, I'm thinking metadata, ramifications about the content and stuff [25:11.800 --> 25:13.800] like that. [25:13.800 --> 25:14.800] What do you mean exactly? [25:14.800 --> 25:22.800] For instance, just a history of the content, and can you trust that this content is from [25:22.800 --> 25:26.200] a certain person or from a certain, you know, like. [25:26.200 --> 25:32.240] I would argue this would probably be some mechanism on top of these content identification. [25:32.240 --> 25:38.160] So this is more for IPLD then, or for, perhaps, I would say, so if you want to say some content [25:38.160 --> 25:43.720] is from some specific person to, then you would work with signatures, so signing the [25:43.720 --> 25:48.920] data and so on, which is something you would bolt on top of IPFS, but nothing I think IPLD [25:48.920 --> 25:51.720] has encoded there right now. [25:51.720 --> 26:05.160] It's partly the same question about how it is ensured that there are no collisions in [26:05.160 --> 26:06.720] the content ID. [26:06.720 --> 26:07.720] No collisions? [26:07.720 --> 26:14.440] Yes, because if you publish some other content with the same content ID, you said it's happening [26:14.440 --> 26:18.400] locally, the content ID generation. [26:18.400 --> 26:20.040] You could fake contents. [26:20.040 --> 26:27.040] Yes, but then all these cryptographic hash functions would be broken then, which would [26:27.040 --> 26:28.920] be very bad. [26:28.920 --> 26:32.600] And if you have a hash collision, then it actually means you have the same content. [26:32.600 --> 26:37.400] That's the assumption right now, or maybe, yes, Joe. [26:37.400 --> 26:42.480] We just use a shadow 56 by default, and you can use also one like black 3, black 2, but [26:42.480 --> 26:46.880] if you find a collision in shadow 56, you have bigger problems and IPFS is not working. [26:46.880 --> 26:52.800] Exactly this, yeah. [26:52.800 --> 26:59.800] Follow on on this, how resilient is this against malicious actors that want to prevent me from [26:59.800 --> 27:02.800] reaching the content? [27:02.800 --> 27:04.880] It's a big question, but maybe something. [27:04.880 --> 27:12.800] Yes, so on peer-to-peer networks, often these kind of civil attacks are in the tech vector [27:12.800 --> 27:18.880] that is considered, which means you generate a lot of identities to populate just some [27:18.880 --> 27:23.560] part of the key space to block some requests from reaching the final destination and so [27:23.560 --> 27:29.560] on. [27:29.560 --> 27:35.760] From my experience, this is quite hard, and I haven't seen this happening. [27:35.760 --> 27:40.880] I cannot say that it's impossible or probably hard to tell. [27:40.880 --> 27:42.880] Max, do you want? [27:42.880 --> 27:49.640] Also, yeah, Kadeimnia has this mechanism where only long-living peers stay in the driving [27:49.640 --> 27:50.640] table. [27:50.640 --> 27:53.200] True, yeah, only, yeah. [27:53.200 --> 27:58.440] So this civil thing is just one attack vector, but this is like the common one that is considered. [27:58.440 --> 28:02.720] So there are many points in the code base where you need to think about what happens [28:02.720 --> 28:09.200] if a civil attack is going on, and one thing that Kadeimnia does is to keep, like, prefer [28:09.200 --> 28:12.120] long-running nodes, stable nodes in the routing table. [28:12.120 --> 28:16.760] So if someone immediately generates a lot of identities that they don't end up in your [28:16.760 --> 28:24.200] routing table and pollutes your routing, your content routing here, or interferes with [28:24.200 --> 28:25.200] that. [28:25.200 --> 28:27.200] All right, go ahead. [28:27.200 --> 28:35.440] I'm not sure if I want to ask it, but removing content, you know, deleting, you know, we [28:35.440 --> 28:42.280] got the EPR, so is there any solution that can be done? [28:42.280 --> 28:44.760] So, yeah, it's hard. [28:44.760 --> 28:48.960] That's part of the thing, if you could, then it's not censorship resistant anymore. [28:48.960 --> 28:57.920] And so what is one solution, well, one alleviation, maybe, is to have a blacklist of CID that [28:57.920 --> 29:06.520] you may publish or may not to say, okay, don't replicate this CID and so on, but this also, [29:06.520 --> 29:12.040] if you have such a list, then it's very easy to just look it up and see what's inside. [29:12.040 --> 29:20.160] Yeah, so deleting content is very tricky, however, I said it's permanent links, yeah, [29:20.160 --> 29:25.400] the links are permanent, but actually content still turns in the IPFS network, and these [29:25.400 --> 29:32.400] provider records that you publish into the network expire after 24 hours, so if no one [29:32.400 --> 29:38.480] actually re-provides the content or keeps the content, the content is gone as well. [29:38.480 --> 29:46.000] But a delete operation doesn't exist, so we just need to hope that no one will be provided [29:46.000 --> 29:56.440] any more, which you could do with these denialists, for example, yeah, Daniel, okay. [29:56.440 --> 30:00.560] Who is able to write into that blacklist and is there any? [30:00.560 --> 30:07.280] Yeah, this is just one, I don't know, to be completely honest, but this is just one, maybe [30:07.280 --> 30:08.280] Jeroko knows. [30:08.280 --> 30:15.360] There is no blacklist in the network right now, it's a few people that want that, but [30:15.360 --> 30:21.600] we have, sorry, earlier you said that we have gateways, and gateways is just a node that [30:21.600 --> 30:27.800] publicly is reachable, and those gateways, because many people say that, okay, they find [30:27.800 --> 30:33.120] some content illegal on IPFS, and instead of reporting to the actual node, so it's a content [30:33.120 --> 30:37.320] on IPFS, they just report it on the gateway, because they know HTTP and they don't know [30:37.320 --> 30:42.440] IPFS, and so our gateway has some blacklist that is somewhere, but it's not shared by [30:42.440 --> 30:46.040] the complete network, it's just for our gateway IPFS.io. [30:46.040 --> 30:52.960] So cloudfair, for example, and I've already read these gateways, or more, anyone could [30:52.960 --> 30:59.920] operate the gateway, so you could file a request for this, don't replicate the CID, it's a [30:59.920 --> 31:04.480] phishing website, for example, and then these CIDs are not served through the gateways, [31:04.480 --> 31:06.720] which is a common way to interact with the network right now. [31:06.720 --> 31:13.720] Just the gateways that follow the list, it's not a domain. [31:13.720 --> 31:20.320] Okay, we're running out of time, unless there is one more. [31:20.320 --> 31:26.600] I have a question regarding searching through the stored content, is there any mechanism [31:26.600 --> 31:35.200] on how to go through or index the files that are there to have some sort of like a search [31:35.200 --> 31:37.320] engine for that? [31:37.320 --> 31:45.400] Right, so there's a project called IPFS search, and this makes use, like among other things, [31:45.400 --> 31:50.880] of this immediate request for CIDs, so it's just sitting there listening, connecting to [31:50.880 --> 31:53.920] a lot of nodes, and as I said, if someone requests content, you immediately ask your [31:53.920 --> 31:59.880] connected peers, and you're connected to a lot of peers, and these IPFS search nodes [31:59.880 --> 32:05.240] are sitting there listening to these requests, and they see, okay, someone wants the CID, [32:05.240 --> 32:09.960] so I go ahead and request that CID as well, and then index that content on myself, and [32:09.960 --> 32:16.760] so you can then search on this IPFS search website for something, just with Google, and [32:16.760 --> 32:21.520] then you see CIDs popping in, and then you can request those CIDs from the IPFS network. [32:21.520 --> 32:26.560] So this is one approach to do that, to index content, yeah. [32:26.560 --> 32:28.560] Okay, thank you. [32:28.560 --> 32:30.560] Thank you.