[00:00.000 --> 00:07.480]  So, great to see you all.
[00:07.480 --> 00:08.480]  So many people here.
[00:08.480 --> 00:09.480]  That's awesome.
[00:09.480 --> 00:10.480]  Welcome to my talk.
[00:10.480 --> 00:12.520]  It's called Decentralized Search with IPFS.
[00:12.520 --> 00:16.080]  Maybe first of all, like a quick pause.
[00:16.080 --> 00:18.400]  How many of you have used IPFS?
[00:18.400 --> 00:20.080]  Please raise your hand.
[00:20.080 --> 00:21.080]  Okay.
[00:21.080 --> 00:22.080]  Okay, nice.
[00:22.080 --> 00:23.920]  And how many of you have heard about IPFS?
[00:23.920 --> 00:25.160]  Okay, all of you.
[00:25.160 --> 00:26.160]  Okay, cool.
[00:26.160 --> 00:28.640]  So, you know all about it already, no?
[00:28.640 --> 00:31.560]  Yeah, so the talk is called How Does It Work Under the Hood?
[00:31.560 --> 00:35.480]  So we will dive in, yeah, pretty deep at some points of the talk.
[00:35.480 --> 00:38.120]  But yeah, first things first.
[00:38.120 --> 00:39.120]  My name is Dennis.
[00:39.120 --> 00:41.480]  I'm a research engineer at Protocol Labs.
[00:41.480 --> 00:46.000]  I'm working in a team called PROBLAB and we're doing network measurements and protocol
[00:46.000 --> 00:47.400]  optimizations there.
[00:47.400 --> 00:51.960]  I'm also an industrial PhD candidate at the University of Göttingen and you can reach
[00:51.960 --> 00:53.760]  me on all these handles on the internet.
[00:53.760 --> 00:58.000]  So, if you have any questions, you can reach out and let me know your questions or just
[00:58.000 --> 01:00.280]  hear the venue after the talk.
[01:00.280 --> 01:01.760]  So what's in for you today?
[01:01.760 --> 01:06.280]  First of all, just in words and numbers, what is the IPFS?
[01:06.280 --> 01:08.520]  Just general overview.
[01:08.520 --> 01:12.960]  And at that point, after we covered that, I would just assume we have installed a local
[01:12.960 --> 01:18.840]  IPFS node on your computer and I will walk you through the different commands from, yeah,
[01:18.840 --> 01:22.800]  we are initializing some of the repository, we are publishing content to the network and
[01:22.800 --> 01:26.440]  so on and we'll explain what happens in each of these steps so that all of you hopefully
[01:26.440 --> 01:30.280]  get a glimpse on what's going on under the hood.
[01:30.280 --> 01:33.800]  So we are importing content, we connect to the network, I explain content routing, this
[01:33.800 --> 01:39.200]  is the very technical part and at the end some call-alls basically.
[01:39.200 --> 01:40.520]  So what is IPFS?
[01:40.520 --> 01:46.840]  IPFS stands for the Interplanetary File System and generally it's a decentralized storage
[01:46.840 --> 01:51.760]  and delivery network which builds on peer-to-peer networking and content-based addressing.
[01:51.760 --> 01:56.440]  So peer-to-peer networking, if you have followed along or if you have been here earlier today,
[01:56.440 --> 02:03.400]  Max gave a great talk about IPTP, about connectivity in general in peer-to-peer networks and IPFS
[02:03.400 --> 02:08.520]  is one of the main users of the IPTP library and builds on top of that.
[02:08.520 --> 02:13.120]  And most importantly, it's very tiny at the bottom, IPFS is not a blockchain, so also
[02:13.120 --> 02:17.760]  a common misconception, I'd like to emphasize that.
[02:17.760 --> 02:23.480]  N numbers, given these numbers are from mid last year, so probably in need of an update
[02:23.480 --> 02:26.800]  but this operation is since 2015, that hasn't changed.
[02:26.800 --> 02:32.280]  Numbers of requests exceed a billion in a week and hundreds of terabytes of traffic that
[02:32.280 --> 02:38.000]  we see and tens of millions of active users also weekly but it is a disclaimer, this is
[02:38.000 --> 02:42.000]  just from our vantage point, in a decentralized network no one has a complete view of what's
[02:42.000 --> 02:50.160]  going on, so these numbers could be much higher or just different in general.
[02:50.160 --> 02:56.120]  On ecosystem.ipfs.tech you can find some companies that build on top of this tech and it's all
[02:56.120 --> 03:04.560]  in these different areas, social media and so on and so forth, so worth looking up.
[03:04.560 --> 03:06.960]  What's the value proposition of IPFS?
[03:06.960 --> 03:12.280]  The most important thing that it does, it decouples the content from its host and it does this through
[03:12.280 --> 03:18.720]  a concept that's called content addressing and content addresses are just our permanent
[03:18.720 --> 03:24.880]  verifiable links and this allows you to request content with this or request data with that
[03:24.880 --> 03:28.680]  content address and anyone can serve you the content and just from the address that you
[03:28.680 --> 03:34.240]  asked with you can identify and verify that the content you got served is actually the
[03:34.240 --> 03:40.560]  one that you requested and you are not dependent on the authenticity of the host as it's the
[03:40.560 --> 03:42.760]  case with HTTP.
[03:42.760 --> 03:46.640]  Because it's a decentralized network, it's also censorship resistant and I like to put
[03:46.640 --> 03:50.000]  here that it alleviates backbone addiction, so what do I mean with that?
[03:50.000 --> 03:54.800]  Let's imagine all of you or all of us wanted to download a 100 megabyte YouTube video here
[03:54.800 --> 03:59.080]  in this room, we would put pressure, so if we were 100 people we would put pressure off
[03:59.080 --> 04:04.840]  about 10 gigabytes onto the backbone to just download the video into this room, wouldn't
[04:04.840 --> 04:08.560]  it be better if we could just download it once and distribute it across each other or
[04:08.560 --> 04:12.840]  download different parts and be a little bit more clever about that.
[04:12.840 --> 04:18.520]  In the similar vein, if we were working on a Google doc here inside this room, why does
[04:18.520 --> 04:22.840]  it stop working if we don't have internet connection anymore?
[04:22.840 --> 04:25.960]  It actually should work, it's actually ridiculous.
[04:25.960 --> 04:31.720]  And also, for some to the same category, this partition tolerance for emerging networks
[04:31.720 --> 04:38.280]  could also become very important or if you're just in a patchy coffee shop Wi-Fi.
[04:38.280 --> 04:42.080]  Alright, so how can you install IPFS?
[04:42.080 --> 04:46.960]  So there, I put down three different ways here, so IPFS in general is not, you don't
[04:46.960 --> 04:53.200]  install IPFS, IPFS is more specification and there are different implementations of this
[04:53.200 --> 04:57.680]  specification and the most common one is Kubo, which was formerly known as Go IPFS,
[04:57.680 --> 05:02.240]  so it's a Go implementation, there's a new one called IRO, which is in Rust and I think
[05:02.240 --> 05:07.840]  the newest one is in JavaScript called Helia, yeah, I think that's also the newest kid on
[05:07.840 --> 05:14.400]  the block and so I will talk about Kubo here and the easiest thing to get started is just
[05:14.400 --> 05:18.800]  download IPFS desktop, which is an electron app that bundles an IPFS node, gives you a
[05:18.800 --> 05:24.600]  nice UI and you can already interact and request CIDs from the network and so on.
[05:24.600 --> 05:28.320]  Then there's the IPFS companion, which is a browser extension that you can install to
[05:28.320 --> 05:34.240]  Firefox or your browser of choice or you directly use Brave or Opera, which comes in with a
[05:34.240 --> 05:40.080]  bundled IPFS node already, so if you enter a IPFS colon slash slash and a CID, it will
[05:40.080 --> 05:43.880]  resolve the content through the IPFS network.
[05:43.880 --> 05:46.440]  But as I said in the beginning, in this talk, we will focus on the command line because
[05:46.440 --> 05:51.480]  we're in a developer conference and I will also assume that we run Kubo, which is the
[05:51.480 --> 05:53.880]  reference implementation basically.
[05:53.880 --> 06:02.360]  So now we have downloaded Kubo from github.com slash IPFS slash Kubo and we want to import
[06:02.360 --> 06:04.160]  some content, we just want to get started.
[06:04.160 --> 06:09.080]  So we downloaded it and now we have this IPFS command on our machine and the first thing
[06:09.080 --> 06:16.600]  that we do is run IPFS in it and what this does is it generates a public parried key pair
[06:16.600 --> 06:23.400]  per default in ED25519 and it spits out this random string of characters, which is basically
[06:23.400 --> 06:24.560]  your public key.
[06:24.560 --> 06:32.120]  So formally it was just the hash of your public key, but now it's just encoded your public
[06:32.120 --> 06:38.040]  key in here and this is your PR identity, which will become important later on.
[06:38.040 --> 06:43.720]  And it also initializes your IPFS repository per default in your home directory under.ipfs.
[06:43.720 --> 06:45.880]  This is the location where it stores all the files.
[06:45.880 --> 06:50.720]  So if you interact with the IPFS network and request files, it stores it in this directory
[06:50.720 --> 06:57.880]  in a specific format similar to Git, how Git does the Git object store basically.
[06:57.880 --> 07:01.240]  And importantly, I will point this out a couple of times, this is just a local operation.
[07:01.240 --> 07:05.320]  So we haven't interacted with the network at all yet.
[07:05.320 --> 07:09.560]  So now we are ready to go, I have a file I want to add.
[07:09.560 --> 07:16.440]  So what I do is I run IPFS add and then my file name and in this case IPFS gives you
[07:16.440 --> 07:21.080]  like a progress bar or a Kubo gives you a progress bar and spits out again a random
[07:21.080 --> 07:26.320]  string of characters, which is the content identifier, the CID, which is the most fundamental
[07:26.320 --> 07:27.320]  ingredient here.
[07:27.320 --> 07:32.080]  And this is the part where it decouples the host, sorry, the content from its host.
[07:32.080 --> 07:36.840]  And as a mental model, you can think about the CID as a hash with some metadata.
[07:36.840 --> 07:38.000]  It's self-describing.
[07:38.000 --> 07:41.120]  So the metadata is this description part.
[07:41.120 --> 07:43.360]  You can see the ingredients at the bottom.
[07:43.360 --> 07:47.960]  So it's just an encoded version of some information like a CID version.
[07:47.960 --> 07:54.440]  So we have version zero and one and some other information that I won't go into right now.
[07:54.440 --> 07:55.920]  Then it's self-certifying.
[07:55.920 --> 08:03.600]  This is the point where if you request some data from the network, you certify the data
[08:03.600 --> 08:09.280]  that you could serve with the CID itself and not with the host that served you the content
[08:09.280 --> 08:11.800]  and just reiterating this.
[08:11.800 --> 08:14.600]  And it's an immutable identifier.
[08:14.600 --> 08:19.000]  And all these structures like the CID structure at the bottom and so on is governed by a project
[08:19.000 --> 08:25.080]  that's called multi-formats and it's also one of Prolucolab's projects here.
[08:25.080 --> 08:31.080]  And so the talk is called what happens under the hood, so what actually happened here.
[08:31.080 --> 08:37.400]  IPFS saw the file, which is just this white box here, a stream of bytes, and IPFS chunked
[08:37.400 --> 08:38.400]  it up.
[08:38.400 --> 08:43.320]  It's in different pieces, which is a common technique in networking, actually.
[08:43.320 --> 08:46.880]  And this gives us some nice properties.
[08:46.880 --> 08:51.440]  It allows us to do piecewise transfers so we can request blocks from different hosts,
[08:51.440 --> 08:52.960]  actually.
[08:52.960 --> 08:54.760]  And it allows for deduplication.
[08:54.760 --> 09:01.200]  Also if we have two blocks that are basically the same bytes, we can deduplicate that and
[09:01.200 --> 09:04.080]  save some storage space underneath.
[09:04.080 --> 09:09.720]  And also if the file was a video file, we also allow for random access so we could start
[09:09.720 --> 09:16.720]  in the middle of a video and don't need to stream all the previous bytes at all.
[09:16.720 --> 09:22.160]  And after we have chunked that up, what we do now or what IPFS does now is we need to
[09:22.160 --> 09:25.440]  put them, we need to put it together again.
[09:25.440 --> 09:29.320]  And what we do here is we hash each individual chunk.
[09:29.320 --> 09:34.000]  Each chunk gets its own CID, its own content identifier.
[09:34.000 --> 09:40.360]  Then the combination of each CID again gets another CID and we do this for both pairs
[09:40.360 --> 09:41.840]  at the bottom.
[09:41.840 --> 09:48.640]  And then the resulting common CIDs again will be put together yet again to generate the
[09:48.640 --> 09:50.600]  root CID, that's how we call it.
[09:50.600 --> 09:53.920]  And this is actually the CID that you see in the command line up there.
[09:53.920 --> 10:00.480]  So we took the chunks, put them, put the identifiers together to arrive at the final CID at the
[10:00.480 --> 10:01.480]  top.
[10:01.480 --> 10:05.920]  And this data structure is actually called a Merkle tree, but in IPFS land it's actually
[10:05.920 --> 10:11.280]  a Merkle deck because in Merkle trees your nodes are not allowed to have common parents.
[10:11.280 --> 10:14.320]  And the deck means here a directed acyclic graph.
[10:14.320 --> 10:18.600]  And let's imagine you didn't add a file but a directory.
[10:18.600 --> 10:24.800]  How do you encode the directory structure and not only the bytes and so on?
[10:24.800 --> 10:30.280]  All these formatting and serialization, deserialization things are governed by yet another project.
[10:30.280 --> 10:34.640]  It's called IPLD, which stands for Interplanetary Link Data.
[10:34.640 --> 10:40.680]  And IPLD does also a lot of more things, but for now this is specified in the scope of
[10:40.680 --> 10:42.360]  this project.
[10:42.360 --> 10:46.840]  So now we have imported the content.
[10:46.840 --> 10:50.240]  We have chunked it up, we've got the CID.
[10:50.240 --> 10:53.640]  But again, we haven't interacted with the network yet.
[10:53.640 --> 10:58.040]  So people think if you add something to IPFS you upload it somewhere and someone else takes
[10:58.040 --> 11:01.320]  care of hosting it for you, for free, which is not the case.
[11:01.320 --> 11:03.600]  So we added it to our local node.
[11:03.600 --> 11:09.960]  So now it ended up in this IPFS repository somewhere on our local machine.
[11:09.960 --> 11:13.680]  But only now we connect to the network and interact with it.
[11:13.680 --> 11:21.160]  For that we run IPFS daemon, which is a long-running process that connects to nodes in the network.
[11:21.160 --> 11:24.520]  We see some versioning information with which Go version was compiled with Kubo version
[11:24.520 --> 11:26.560]  we actually use.
[11:26.560 --> 11:32.320]  We see the addresses that the Kubo node listens on and also which ones are announced to the
[11:32.320 --> 11:37.080]  network, under which network addresses we are reachable.
[11:37.080 --> 11:41.600]  And then tells us that it started an API server, a web UI in the gateway.
[11:41.600 --> 11:46.960]  The API server is just an RPC API that is used by the command line to control the IPFS
[11:46.960 --> 11:47.960]  node.
[11:47.960 --> 11:52.840]  The web UI is the thing that you saw previously when you saw the screenshot of the IPFS desktop.
[11:52.840 --> 11:58.240]  So your local Kubo node also serves this web UI.
[11:58.240 --> 11:59.240]  And then the gateway.
[11:59.240 --> 12:00.640]  And the gateway is quite interesting.
[12:00.640 --> 12:04.840]  So this bridges the HTTP world with the IPFS world.
[12:04.840 --> 12:09.280]  So you can ask under this endpoint that you can see down there.
[12:09.280 --> 12:17.040]  If you put IPFS slash your CID inside the browser or in your SUD URL, the Kubo node
[12:17.040 --> 12:20.760]  will go ahead and resolve the CID in the network and serve it to you over HTTP.
[12:20.760 --> 12:24.040]  So this is like a bridge between both worlds.
[12:24.040 --> 12:29.480]  And ProCollapse and Cloudflare and so on are actually running these gateways on the internet
[12:29.480 --> 12:35.240]  right now, which you can use just a low barrier entry to the whole thing.
[12:35.240 --> 12:37.200]  And then the daemon is ready.
[12:37.200 --> 12:41.280]  And in this process, it has also connected to bootstrap nodes, which are hard coded to
[12:41.280 --> 12:43.920]  actually get to know other peers in the network.
[12:43.920 --> 12:48.960]  But you can also override it with your own bootstrap nodes.
[12:48.960 --> 12:49.960]  So now we are connected to the network.
[12:49.960 --> 12:53.520]  We have added our file to our own machine.
[12:53.520 --> 12:59.160]  But now the interesting or the problem or like the challenge, how do we actually find
[12:59.160 --> 13:02.000]  content hosts for a given CID?
[13:02.000 --> 13:08.640]  So I give my friend a CID, how does the node know that it needs to connect to me to request
[13:08.640 --> 13:09.640]  the content, actually?
[13:09.640 --> 13:11.760]  And I put here the solution is simple.
[13:11.760 --> 13:12.760]  We keep a mapping table.
[13:12.760 --> 13:17.520]  So we just have the CID mapped to the actual peer and every node has this on their machine.
[13:17.520 --> 13:21.120]  So everyone knows everything, basically.
[13:21.120 --> 13:27.480]  But as I said, the mapping table gets humongous, especially if we've split up those files into
[13:27.480 --> 13:32.960]  different chunks, and I think the default chunking size is 256 kilobytes.
[13:32.960 --> 13:34.600]  So we have just a lot of entries in this table.
[13:34.600 --> 13:36.320]  So this doesn't scale.
[13:36.320 --> 13:40.560]  So the solution would be to split this table, and each participating peer in this decentralized
[13:40.560 --> 13:45.000]  network holds a separate part of the table.
[13:45.000 --> 13:46.960]  But then we are back to square one.
[13:46.960 --> 13:51.640]  How do we know which peer holds which piece of this distributed hash table data?
[13:51.640 --> 13:58.400]  And the solution here would be to use a just deterministic distribution based on the Cademia
[13:58.400 --> 13:59.400]  DHT.
[13:59.400 --> 14:04.560]  Cademia is like a, is a, is a implementate or like a specific protocol for a distributed
[14:04.560 --> 14:06.320]  hash table.
[14:06.320 --> 14:12.240]  And at this point, I thought, so at this point, many talks on the internet about IPFS gloss
[14:12.240 --> 14:14.320]  over the DHT and how it works.
[14:14.320 --> 14:19.160]  And so when I got into this whole thing, I was lacking something.
[14:19.160 --> 14:24.880]  And so my experiment would be to just dive even a little deeper into, into this.
[14:24.880 --> 14:30.000]  And I would cover a bit of Cademia here, but at the end, this is very technical.
[14:30.000 --> 14:34.680]  But at the end, I would try to summarize everything so that everyone of you gets a little bit
[14:34.680 --> 14:35.920]  out of this.
[14:35.920 --> 14:37.800]  This whole process is called content routing.
[14:37.800 --> 14:43.080]  So this resolution of a CID to the content host.
[14:43.080 --> 14:50.360]  And IPFS uses an adaptation of the Cademia DHT by using a 256 bit key space.
[14:50.360 --> 14:56.840]  So we are hashing the CID and the PRID yet again with the SHA-256 to arrive in a common,
[14:56.840 --> 14:58.760]  in a common key space.
[14:58.760 --> 15:03.120]  And the distributed hash table in IPFS is just a distributed system that maps these keys
[15:03.120 --> 15:04.120]  to values.
[15:04.120 --> 15:10.320]  And the most important records here are provider records, which map a CID to a PRID.
[15:10.320 --> 15:14.960]  Some of the PRID is that what was generated when we initialize our node.
[15:14.960 --> 15:21.760]  And PRID and PR records, which then map the PRID to actually network addresses, like IP
[15:21.760 --> 15:23.160]  addresses and ports.
[15:23.160 --> 15:27.840]  So looking up a CID to a host for a CID is actually a two-step process.
[15:27.840 --> 15:32.680]  First we need to resolve the CID to a PRID, and then the PRID to their network addresses.
[15:32.680 --> 15:35.040]  And then we can connect to each other.
[15:35.040 --> 15:41.680]  And the distributed hash table here has two key features, first an X or distance metric.
[15:41.680 --> 15:45.160]  So that means we have some notion of closeness.
[15:45.160 --> 15:49.960]  So what this XOR thing does, so if I XOR two numbers together, the resulting number or
[15:49.960 --> 15:54.880]  this operation satisfies the condition, the requirements for a metric.
[15:54.880 --> 16:02.360]  So this means I can say a certain PRID is closer to a CID than some other PRID.
[16:02.360 --> 16:07.680]  So in this case, PRIDX could be closer to CID1 than PRIDY.
[16:07.680 --> 16:17.040]  And this allows us to basically sort CIDs with PRIDs together.
[16:17.040 --> 16:18.800]  And then this tree-based routing mechanism here.
[16:18.800 --> 16:23.200]  So in this bottom right diagram, I got this from the original paper, we have the black
[16:23.200 --> 16:24.520]  node.
[16:24.520 --> 16:29.720]  And with this tree-based routing, this is super clever as in each bubble, so all the
[16:29.720 --> 16:35.040]  PRID peers in the network can actually be considered as in a big try, a prefix try.
[16:35.040 --> 16:41.040]  And if we know only one PRID in each of these bubbles, we can guarantee that we can reach
[16:41.040 --> 16:47.320]  any other PRID in the network with O log N lookups by asking for even closer PRIDs based
[16:47.320 --> 16:52.760]  on this XOR routing mechanism here.
[16:52.760 --> 16:56.160]  So this was just abstractly what the distributed hash table in IPFS does.
[16:56.160 --> 16:58.360]  So how does it work concretely for IPFS?
[16:58.360 --> 16:59.640]  So we started the daemon process.
[16:59.640 --> 17:04.960]  What happened under the hood was we calculated the SHA-256 of our PRID, which just gives
[17:04.960 --> 17:09.400]  us a long string of bits and bytes, or just bits basically in our case.
[17:09.400 --> 17:12.120]  And we initialized a routing table at the bottom.
[17:12.120 --> 17:14.880]  And this routing table consists of different buckets.
[17:14.880 --> 17:23.800]  And each bucket is filled with peers that have a common prefix to our PRID, the hash
[17:23.800 --> 17:26.160]  from our PRID at the top.
[17:26.160 --> 17:33.280]  And when our node started up, we asked the bootstrap peers, hey, do you know anyone whose
[17:33.280 --> 17:36.520]  SHA-256 from PRID starts with a 1?
[17:36.520 --> 17:42.320]  And this means we have no common prefix, and we put them, those peers in bucket 0.
[17:42.320 --> 17:47.440]  Then we do the same for a prefix of 0, 0 and 0, 1, 1.
[17:47.440 --> 17:52.560]  And so we go through all the list until 255, and we fill up these buckets.
[17:52.560 --> 17:56.240]  And these are basically these buckets, these little blobs, these little circuits that you
[17:56.240 --> 17:59.320]  saw in the previous slide.
[17:59.320 --> 18:00.880]  And why did we do that?
[18:00.880 --> 18:05.920]  Because when we now want to retrieve content, so as I said, I handed the CID to my friend,
[18:05.920 --> 18:12.000]  and my friend enters the CID in the command line with this IPFS get command.
[18:12.000 --> 18:17.120]  Their node also calculates the SHA-256 of the CID, and then looks in its own routing
[18:17.120 --> 18:20.680]  table, sees, OK, I have a prefix of 2.
[18:20.680 --> 18:26.040]  I take one peer out of this bucket 2 and ask, yeah, locate the appropriate bucket, get the
[18:26.040 --> 18:30.920]  list of all peers, and then I asked all of these peers in the bucket, hey, do you know
[18:30.920 --> 18:31.920]  anyone?
[18:31.920 --> 18:34.240]  So first of all, do you know the provider record already?
[18:34.240 --> 18:37.920]  Do you know the CID and the PRID to that CID?
[18:37.920 --> 18:42.640]  And if yes, we are done, but if not, we are asking, do you know anyone closer based on
[18:42.640 --> 18:43.640]  this XR metric?
[18:43.640 --> 18:47.240]  And then this peer yet again looks in its own routing table, and so we get closer and
[18:47.240 --> 18:54.720]  closer and closer with this log n property that I showed you previously.
[18:54.720 --> 18:57.880]  And for publishing content, it's basically the same.
[18:57.880 --> 19:03.360]  We calculate the SHA-256 of the CID, locate the appropriate bucket, get a list of all
[19:03.360 --> 19:10.600]  the peers from that, and then we start parallel queries, but instead of asking for the provider
[19:10.600 --> 19:13.040]  record, we ask for even closer peers.
[19:13.040 --> 19:21.320]  And we terminate when the closest known peers in the query actually haven't replied with
[19:21.320 --> 19:31.240]  any peer that's closer, hasn't replied with anyone closer to the CID than we already know.
[19:31.240 --> 19:36.480]  And then we start the provider record with the 20 closest peers to that CID, and we do
[19:36.480 --> 19:41.800]  it with 20 because there's peer churn, so this is a permissionless network, and this
[19:41.800 --> 19:47.320]  means peers can come and go as they wish, and if we only started with one peer, we would
[19:47.320 --> 19:53.600]  risk that the provider record is not reachable when the node comes down, and in turn all
[19:53.600 --> 19:57.520]  content is not reachable.
[19:57.520 --> 20:01.440]  So this is like the very technical part of that, but let me summarize this.
[20:01.440 --> 20:06.280]  This is probably the easier way to understand all of this.
[20:06.280 --> 20:11.080]  First of all, so we added the content to our node, and so this is the file, enters the
[20:11.080 --> 20:16.680]  provider, the provider looks in its routing table, gets redirected to peer that is closer
[20:16.680 --> 20:24.720]  to the CID, and gets redirected until it finds the closest peer in this XR key space metric
[20:24.720 --> 20:28.120]  to the CID, and then it stores the provider record with that.
[20:28.120 --> 20:33.680]  Then off-band, the CID gets handed to the requester to my friend, and what I didn't say
[20:33.680 --> 20:40.320]  or told you yet, it's also IPFS maintains a long list or like, I don't know how many
[20:40.320 --> 20:47.080]  it is right now, probably a hundred or so, constant connections to other peers, and opportunistically
[20:47.080 --> 20:52.960]  just ask them, hey, do you know the CID or the provider record to the CID?
[20:52.960 --> 20:58.320]  And if this resolves, all good, we are done, but it's very unlikely for people to actually
[20:58.320 --> 21:01.000]  know a random CID.
[21:01.000 --> 21:02.480]  So let's assume this didn't work.
[21:02.480 --> 21:07.160]  So this requester also looks in its own routing table, gets redirected, gets redirected even
[21:07.160 --> 21:17.880]  closer, even closer to the peer ID of that CID, and then finds the peer that stores the
[21:17.880 --> 21:24.440]  provider record, fetches the provider record, then does again the same hops to find out
[21:24.440 --> 21:28.560]  the mapping from the peer ID to the network addresses, and then we can actually connect
[21:28.560 --> 21:35.440]  with each other and transfer the content, and we're done.
[21:35.440 --> 21:42.200]  So this is the content lifecycle, and this is actually, this is already it, well, already
[21:42.200 --> 21:49.640]  it is quite a bit, quite involved actually, and yeah, with that, it's already time for
[21:49.640 --> 21:57.360]  some callouts, get involved, IPFS is an open source project, if you're into measurements
[21:57.360 --> 22:04.600]  and so on, we have some grants open at radius.space, if you want to get involved with some network
[22:04.600 --> 22:10.240]  measurements, get your applications in, all action is in public, you can follow along
[22:10.240 --> 22:16.800]  our work, especially my work of our team, at this GitHub repository, we have plenty
[22:16.800 --> 22:22.480]  of requests for measurements that you can dive into, and extra ideas are always welcome.
[22:22.480 --> 22:31.480]  In general, IPFS is, I think, a very welcoming community, at least for me, and yeah, just,
[22:31.480 --> 22:32.480]  that's it.
[22:32.480 --> 22:50.320]  So, any questions?
[22:50.320 --> 22:56.400]  So is the way you describe it, using the DHT, how all nodes in the network share files with
[22:56.400 --> 22:57.720]  each other?
[22:57.720 --> 23:04.960]  There's one content routing mechanism, so there are multiple ones, so this first thing
[23:04.960 --> 23:09.000]  that I said here, so this opportunistic request to your immediate nodes is also some kind of
[23:09.000 --> 23:13.920]  content routing, so you're resolving the location of content, then there are some new efforts
[23:13.920 --> 23:18.840]  for building network indexes, which are just huge nodes that store the mappings, centralized
[23:18.840 --> 23:28.880]  nodes, which, like, federated centralized nodes, so not as bad, and I think, yeah, I think
[23:28.880 --> 23:34.840]  these are the important ones, basically, yeah, so there are more ways to resolve them.
[23:34.840 --> 23:39.920]  Also MDNS could also be one part, so if you're on the same network, you're broadcasting,
[23:39.920 --> 23:49.200]  I know, that's just for, sorry, for the local, yeah, okay, true, yeah, luckily we have a
[23:49.200 --> 23:57.960]  core maintainer of IPFS here, yeah, it's actually not a joke, but yeah, sorry, yeah.
[23:57.960 --> 24:02.440]  So I see that the provider records get replicated, but does the content actually get replicated
[24:02.440 --> 24:04.120]  across the network too?
[24:04.120 --> 24:11.360]  Yeah, so only if someone else chooses to, so you're publishing the provider record,
[24:11.360 --> 24:18.520]  so it's public somewhere, and anyone could look that up and also store the record themselves,
[24:18.520 --> 24:25.280]  so this is the idea, if content is popular and you care about the content being, staying
[24:25.280 --> 24:32.400]  alive in the network, it's called PIN, the CID, and this means you're fetching the content
[24:32.400 --> 24:37.520]  from this other provider and store it yourself and become the provider yourself, and because
[24:37.520 --> 24:43.200]  of the CID mechanism, which is self-certifying and so on, other peers that request the content
[24:43.200 --> 24:50.280]  from you don't even need to trust you, because the CID already encodes the trust chain here,
[24:50.280 --> 24:54.120]  but there's nothing that happened, it's not happening automatically here, so.
[24:54.120 --> 24:56.880]  But you can have multiple providers for the same company?
[24:56.880 --> 24:59.400]  Definitely, yeah, that's also, yeah, definitely, that's part of it.
[24:59.400 --> 25:06.040]  Another question is how does the project fit in, the concept of identity and trust and
[25:06.040 --> 25:11.800]  personas into IPFS, I'm thinking metadata, ramifications about the content and stuff
[25:11.800 --> 25:13.800]  like that.
[25:13.800 --> 25:14.800]  What do you mean exactly?
[25:14.800 --> 25:22.800]  For instance, just a history of the content, and can you trust that this content is from
[25:22.800 --> 25:26.200]  a certain person or from a certain, you know, like.
[25:26.200 --> 25:32.240]  I would argue this would probably be some mechanism on top of these content identification.
[25:32.240 --> 25:38.160]  So this is more for IPLD then, or for, perhaps, I would say, so if you want to say some content
[25:38.160 --> 25:43.720]  is from some specific person to, then you would work with signatures, so signing the
[25:43.720 --> 25:48.920]  data and so on, which is something you would bolt on top of IPFS, but nothing I think IPLD
[25:48.920 --> 25:51.720]  has encoded there right now.
[25:51.720 --> 26:05.160]  It's partly the same question about how it is ensured that there are no collisions in
[26:05.160 --> 26:06.720]  the content ID.
[26:06.720 --> 26:07.720]  No collisions?
[26:07.720 --> 26:14.440]  Yes, because if you publish some other content with the same content ID, you said it's happening
[26:14.440 --> 26:18.400]  locally, the content ID generation.
[26:18.400 --> 26:20.040]  You could fake contents.
[26:20.040 --> 26:27.040]  Yes, but then all these cryptographic hash functions would be broken then, which would
[26:27.040 --> 26:28.920]  be very bad.
[26:28.920 --> 26:32.600]  And if you have a hash collision, then it actually means you have the same content.
[26:32.600 --> 26:37.400]  That's the assumption right now, or maybe, yes, Joe.
[26:37.400 --> 26:42.480]  We just use a shadow 56 by default, and you can use also one like black 3, black 2, but
[26:42.480 --> 26:46.880]  if you find a collision in shadow 56, you have bigger problems and IPFS is not working.
[26:46.880 --> 26:52.800]  Exactly this, yeah.
[26:52.800 --> 26:59.800]  Follow on on this, how resilient is this against malicious actors that want to prevent me from
[26:59.800 --> 27:02.800]  reaching the content?
[27:02.800 --> 27:04.880]  It's a big question, but maybe something.
[27:04.880 --> 27:12.800]  Yes, so on peer-to-peer networks, often these kind of civil attacks are in the tech vector
[27:12.800 --> 27:18.880]  that is considered, which means you generate a lot of identities to populate just some
[27:18.880 --> 27:23.560]  part of the key space to block some requests from reaching the final destination and so
[27:23.560 --> 27:29.560]  on.
[27:29.560 --> 27:35.760]  From my experience, this is quite hard, and I haven't seen this happening.
[27:35.760 --> 27:40.880]  I cannot say that it's impossible or probably hard to tell.
[27:40.880 --> 27:42.880]  Max, do you want?
[27:42.880 --> 27:49.640]  Also, yeah, Kadeimnia has this mechanism where only long-living peers stay in the driving
[27:49.640 --> 27:50.640]  table.
[27:50.640 --> 27:53.200]  True, yeah, only, yeah.
[27:53.200 --> 27:58.440]  So this civil thing is just one attack vector, but this is like the common one that is considered.
[27:58.440 --> 28:02.720]  So there are many points in the code base where you need to think about what happens
[28:02.720 --> 28:09.200]  if a civil attack is going on, and one thing that Kadeimnia does is to keep, like, prefer
[28:09.200 --> 28:12.120]  long-running nodes, stable nodes in the routing table.
[28:12.120 --> 28:16.760]  So if someone immediately generates a lot of identities that they don't end up in your
[28:16.760 --> 28:24.200]  routing table and pollutes your routing, your content routing here, or interferes with
[28:24.200 --> 28:25.200]  that.
[28:25.200 --> 28:27.200]  All right, go ahead.
[28:27.200 --> 28:35.440]  I'm not sure if I want to ask it, but removing content, you know, deleting, you know, we
[28:35.440 --> 28:42.280]  got the EPR, so is there any solution that can be done?
[28:42.280 --> 28:44.760]  So, yeah, it's hard.
[28:44.760 --> 28:48.960]  That's part of the thing, if you could, then it's not censorship resistant anymore.
[28:48.960 --> 28:57.920]  And so what is one solution, well, one alleviation, maybe, is to have a blacklist of CID that
[28:57.920 --> 29:06.520]  you may publish or may not to say, okay, don't replicate this CID and so on, but this also,
[29:06.520 --> 29:12.040]  if you have such a list, then it's very easy to just look it up and see what's inside.
[29:12.040 --> 29:20.160]  Yeah, so deleting content is very tricky, however, I said it's permanent links, yeah,
[29:20.160 --> 29:25.400]  the links are permanent, but actually content still turns in the IPFS network, and these
[29:25.400 --> 29:32.400]  provider records that you publish into the network expire after 24 hours, so if no one
[29:32.400 --> 29:38.480]  actually re-provides the content or keeps the content, the content is gone as well.
[29:38.480 --> 29:46.000]  But a delete operation doesn't exist, so we just need to hope that no one will be provided
[29:46.000 --> 29:56.440]  any more, which you could do with these denialists, for example, yeah, Daniel, okay.
[29:56.440 --> 30:00.560]  Who is able to write into that blacklist and is there any?
[30:00.560 --> 30:07.280]  Yeah, this is just one, I don't know, to be completely honest, but this is just one, maybe
[30:07.280 --> 30:08.280]  Jeroko knows.
[30:08.280 --> 30:15.360]  There is no blacklist in the network right now, it's a few people that want that, but
[30:15.360 --> 30:21.600]  we have, sorry, earlier you said that we have gateways, and gateways is just a node that
[30:21.600 --> 30:27.800]  publicly is reachable, and those gateways, because many people say that, okay, they find
[30:27.800 --> 30:33.120]  some content illegal on IPFS, and instead of reporting to the actual node, so it's a content
[30:33.120 --> 30:37.320]  on IPFS, they just report it on the gateway, because they know HTTP and they don't know
[30:37.320 --> 30:42.440]  IPFS, and so our gateway has some blacklist that is somewhere, but it's not shared by
[30:42.440 --> 30:46.040]  the complete network, it's just for our gateway IPFS.io.
[30:46.040 --> 30:52.960]  So cloudfair, for example, and I've already read these gateways, or more, anyone could
[30:52.960 --> 30:59.920]  operate the gateway, so you could file a request for this, don't replicate the CID, it's a
[30:59.920 --> 31:04.480]  phishing website, for example, and then these CIDs are not served through the gateways,
[31:04.480 --> 31:06.720]  which is a common way to interact with the network right now.
[31:06.720 --> 31:13.720]  Just the gateways that follow the list, it's not a domain.
[31:13.720 --> 31:20.320]  Okay, we're running out of time, unless there is one more.
[31:20.320 --> 31:26.600]  I have a question regarding searching through the stored content, is there any mechanism
[31:26.600 --> 31:35.200]  on how to go through or index the files that are there to have some sort of like a search
[31:35.200 --> 31:37.320]  engine for that?
[31:37.320 --> 31:45.400]  Right, so there's a project called IPFS search, and this makes use, like among other things,
[31:45.400 --> 31:50.880]  of this immediate request for CIDs, so it's just sitting there listening, connecting to
[31:50.880 --> 31:53.920]  a lot of nodes, and as I said, if someone requests content, you immediately ask your
[31:53.920 --> 31:59.880]  connected peers, and you're connected to a lot of peers, and these IPFS search nodes
[31:59.880 --> 32:05.240]  are sitting there listening to these requests, and they see, okay, someone wants the CID,
[32:05.240 --> 32:09.960]  so I go ahead and request that CID as well, and then index that content on myself, and
[32:09.960 --> 32:16.760]  so you can then search on this IPFS search website for something, just with Google, and
[32:16.760 --> 32:21.520]  then you see CIDs popping in, and then you can request those CIDs from the IPFS network.
[32:21.520 --> 32:26.560]  So this is one approach to do that, to index content, yeah.
[32:26.560 --> 32:28.560]  Okay, thank you.
[32:28.560 --> 32:30.560]  Thank you.