[00:00.000 --> 00:17.000] Next presenter. Who is Theo? Yes. So, talking about cryptpads. Yes, the floor is yours. [00:17.000 --> 00:24.000] Thank you. Yeah, whom do you trust? I think this question is really a serious question, [00:24.000 --> 00:33.000] especially for privacy and collaboration in today's Internet. Well, yeah, let's directly start [00:33.000 --> 00:42.000] with this question or with what collaboration is. Collaborative editing is that multiple people [00:42.000 --> 00:49.000] work on the same document at the same time and they want that their changes are transmitted [00:49.000 --> 00:56.000] in near real time. So here in this example you see that one person writes there, the update is [00:56.000 --> 01:03.000] propagated to the server and the server further forwards the message to all other users. Here [01:03.000 --> 01:11.000] you see that in this generic example that the server can see all messages. The server has a local [01:11.000 --> 01:20.000] copy of the document and updates as soon as it gets a message from a user. And already here we [01:20.000 --> 01:27.000] should say we're like, hmm, okay, so whom do we need to trust? We obviously need to trust the [01:27.000 --> 01:37.000] server in this example because the server can see the documents. So this leads me to the second [01:37.000 --> 01:43.000] part, to the privacy that we want. And here I give you some informal definition and we say that no [01:43.000 --> 01:52.000] untrusted entity can infer personal information, document content, or who the collaborators are. [01:52.000 --> 01:59.000] So for an untrusted party, the document should look like this, just like snippets. And this [01:59.000 --> 02:06.000] untrusted party should not infer any information. Here the key point is that it's an untrusted entity. [02:06.000 --> 02:12.000] Because this does not hold for everyone. For example, the collaborators, they should be able to read the [02:12.000 --> 02:24.000] document. So the question is whom we trust? And I'll start with the solution that's probably the most [02:24.000 --> 02:32.000] used today. And yeah, why not trust Google and Co. And there may be many reasons. I just want to give [02:32.000 --> 02:39.000] you one example. And it's the case of Desha Rawi. Here, Naomi Klein, a famous environmental [02:39.000 --> 02:47.000] activist writes that India targets climate activists with the help of Big Tech. Tech shines like [02:47.000 --> 02:53.000] Google and Facebook appear to be abating and abetting a vicious government campaign against [02:53.000 --> 03:05.000] Indian climate activists. So what happened here, there was, cannot go too far to the side, was [03:05.000 --> 03:12.000] climate activist Desha Rawi who founded the Indian, was co-founder of the Indian chapter of [03:12.000 --> 03:19.000] Fridays for Future. And they worked on my Google Docs where they discussed how to help Indian [03:19.000 --> 03:25.000] farmer protests. And there was stuff like use this tweet or you can write a letter to your [03:25.000 --> 03:33.000] government. This document was leaked publicly on Twitter, I think. And then the Indian government [03:33.000 --> 03:44.000] thought of this is a conspiracy theory and wanted to track down who actually wrote this document. [03:44.000 --> 03:52.000] So they asked Google and Google helped them and said it's this and this and this person. And then [03:52.000 --> 04:03.000] Desha Rawi was arrested for a few days. She was later on, she was against freedom and there was [04:03.000 --> 04:09.000] no sentence against her in the end. But nevertheless this shows that we cannot really trust [04:09.000 --> 04:16.000] Google to host sensitive documents. So what can we do against? Or what is an alternative [04:16.000 --> 04:24.000] solution? And I think one of the most obvious answers, especially at a conference like here, is to [04:24.000 --> 04:30.000] say that we need to control the software. We need to have the server and the client's open source. [04:30.000 --> 04:39.000] Because if this is the case, then we can host the software on our own instance, on our own server. And [04:39.000 --> 04:50.000] we can decide whom we want to give the data. And yeah. So this would be a first approach. So we [04:50.000 --> 04:58.000] could say, yeah, it's freedom of software, we are safe. And this is exactly a quotation here from [04:58.000 --> 05:05.000] Jitsi Mead. And they say that the possibility to run your own instance completely removes the need to [05:05.000 --> 05:13.000] trust a third party provider and therefore eliminates the need for end-to-end encryption. So they say [05:13.000 --> 05:20.000] exactly this, you can run it your own. You don't need any other pre-consciousness. No, this is fine [05:20.000 --> 05:28.000] because it's open source. Jitsi Mead is a video conferencing platform you may be familiar with. So this [05:28.000 --> 05:35.000] is a bit different. I will come to it later. And also interestingly, also interesting is that they [05:35.000 --> 05:42.000] remove the statement from their website only a bit after I started to prepare my talk. So this is [05:42.000 --> 05:52.000] from December 2022. But are we really safe? And to answer these questions or some more questions, can [05:52.000 --> 06:00.000] really everybody run their own instance? I mean, yes, probably most of you have the technical [06:00.000 --> 06:06.000] capabilities. But do other people have this capability? Do they have the infrastructure? Do they [06:06.000 --> 06:14.000] have the money to run this? No, probably not. And the second question is, do you really want to [06:14.000 --> 06:20.000] trust a system administrator to see all your documents? So imagine you're in a company and you [06:20.000 --> 06:27.000] are working in a collaborative system and you have the salary sheets online. Do you want the system [06:27.000 --> 06:35.000] administrator to read that? No, probably not. Even if you trust it in the first place. [06:35.000 --> 06:41.000] And then, and this is where the difference is to video conferencing, is that documents are not [06:41.000 --> 06:48.000] ephemeral. So a video stream you can safely delete after the conference has ended. But a document [06:48.000 --> 06:55.000] must be stored in the server because you want to access it later. And this means you do not only [06:55.000 --> 07:03.000] need to protect your documents currently, but also in the long term. So that if the server is [07:03.000 --> 07:11.000] under attack or an attacker gets access to it, they should not have access to the documents. [07:11.000 --> 07:19.000] Okay, so if you see this, then you probably think we need end-to-end encryption. And end-to-end [07:19.000 --> 07:26.000] encryption is in principle that you have one party, let's say Alice, and Alice encrypts a [07:26.000 --> 07:37.000] document, they send it to Bob and Bob decrypts it. And in the middle, the data is not readable. [07:37.000 --> 07:47.000] So this is the encrypted ciphertext. And you see here, it's exactly the snippets we want. So this [07:47.000 --> 07:54.000] technically looks good, and we could say, okay, we apply this, and we can say it's end-to-end encrypted, [07:54.000 --> 08:00.000] we are safe. And here's a statement of Google, and they say that with Google Workspace, client-side [08:00.000 --> 08:08.000] encryption, content encryption is handled in the client's browser before any data is transmitted. [08:08.000 --> 08:14.000] So here first note that client-side encryption is not the same as end-to-end encryption. It's [08:14.000 --> 08:22.000] different, especially in the question who holds the key. And client-side encryption, it's not you [08:22.000 --> 08:30.000] as a user who holds the key, but the keys are stored on a third-party server. So there comes [08:30.000 --> 08:38.000] again this question of trust, if you trust this third-party server. Okay, so we could say it's [08:38.000 --> 08:46.000] end-to-end encrypted, we are safe. Well, really. First, there are the metadata. And metadata is [08:46.000 --> 08:54.000] all about who connects to the server, at which time, from which IP address, who collaborates, which [08:54.000 --> 09:01.000] people are accessing the document at the same time. And all these metadata, they are still there, [09:01.000 --> 09:10.000] even if the content is encrypted. So, yeah, still a problem. And second, we have Kirchhoff's [09:10.000 --> 09:18.000] principle from cryptography, which says that a cryptosystem should be secure, even if everything [09:18.000 --> 09:25.000] about the system accepts the key is public knowledge. So you should be able to release all the [09:25.000 --> 09:32.000] code, and all information accepts the key, and it should still be secure. And for me, it's really [09:32.000 --> 09:42.000] urgent for open source. And, yeah, that's why I think it's urgent for open source. So we see that [09:42.000 --> 09:50.000] we need both of them. And here, I want to present you CripPad. CripPad is an online collaborative [09:50.000 --> 09:58.000] editing tool. There are multiple parts of it. There is a whiteboard. There is code marked [09:58.000 --> 10:08.000] on. There are slides, like these ones, and documents. It's open source software from the [10:08.000 --> 10:17.000] client code is open source, as well as the server code. So you can host your own instance. [10:17.000 --> 10:24.000] And there are about 200 maintained instances. We at the CripPad team, we host a flagship [10:24.000 --> 10:33.000] instance, which has about 200,000 registered users. And how does CripPad encrypt? So in [10:33.000 --> 10:39.000] CripPad, we have this end-to-end encryption. We have that an update is propagated in encryption [10:39.000 --> 10:47.000] form, encrypted form, and the server only has an encrypted state of the document. So the [10:47.000 --> 10:55.000] server cannot infer the actual content of the document. And how do we share the keys? In the [10:55.000 --> 11:02.000] most basic way, we share the keys over the fragment identifier of the URLs. That means we put the [11:02.000 --> 11:12.000] keys after the hashtag of the URL. Like this, you can easily share a document. What do we [11:12.000 --> 11:19.000] trust? As I saw, as I said, you still have the metadata. So you still need some trust. In [11:19.000 --> 11:27.000] CripPad, you have to trust that the server is not an active attacker. That means that you [11:27.000 --> 11:32.000] expect that you trust that the server acts according to the protocol. It runs the correct [11:32.000 --> 11:40.000] code, and it does not deliver any malicious things. Or it does not repeat stuff, and so [11:40.000 --> 11:48.000] on. It does not reorder stuff like this. And why do we have this trust requirement? There are [11:48.000 --> 11:54.000] two reasons. The first one is a practical. We have a web application where you get the [11:54.000 --> 12:01.000] client source code from the server. So if the server would deliver bogus client code, [12:01.000 --> 12:08.000] well, then every security guarantee is lost. And then there is the second one, which is [12:08.000 --> 12:14.000] more theoretical. Namely that the server can always delete files. Even if they are encrypted, [12:14.000 --> 12:22.000] the server can delete them without problem. Okay. So you see that you need some trust. But [12:22.000 --> 12:28.000] the cool thing about CripPad is that there are other stuff which you don't have to trust. [12:28.000 --> 12:35.000] Namely, the server could be an honest but curious attacker. That means that even if the [12:35.000 --> 12:42.000] server watches you, you're still safe. You don't have to trust the server that it does [12:42.000 --> 12:51.000] not watch you. No, it's explicitly allowed. And why do we have that? Well, the server [12:51.000 --> 12:59.000] may become corrupt. Even if you trust it to be not actively malicious, it's still [12:59.000 --> 13:07.000] maybe at some point in time, it may be corrupt. And this was especially the case here in [13:07.000 --> 13:14.000] last summer where there was the CripPad instance hosted by Germany's private party. And on [13:14.000 --> 13:21.000] this instance, some sensitive documents about the G7 summit were leaked. And then the police [13:21.000 --> 13:27.000] asked the pirate party to hand out their data. Otherwise, they would seize the server. So [13:27.000 --> 13:33.000] the police got access to this data. And now the police could not read anything. They [13:33.000 --> 13:39.000] could not read the documents. Namely, because everything was entered into encryption. So [13:39.000 --> 13:45.000] this shows that even if you trust it in the first place, we still cannot be sure that [13:45.000 --> 13:52.000] it's trustworthy forever. And this setting, yeah, and as I said, this setting is exactly [13:52.000 --> 14:00.000] covered in this honest but curious attacker, which we allow. There's also another point [14:00.000 --> 14:07.000] of view to look at this. We could all say that we protect the server from its users. So [14:07.000 --> 14:16.000] for example, the server administrator of the CripPad of Germany's pirate party was not [14:16.000 --> 14:22.000] consulting. How could they know what documents were published on their server because it's [14:22.000 --> 14:30.000] encrypted? So this shows that such encryption is also nice for you in terms of system [14:30.000 --> 14:36.000] administrator because it allows you to offer the service without taking too much risk to [14:36.000 --> 14:47.000] you. So, yeah, as a take home message, I really want to say that we need both. We need [14:47.000 --> 14:54.000] open source and end-to-end encryption for good trust assumption. And with this, I'm at [14:54.000 --> 15:01.000] the end of my presentation. Shout out to my team. It's David is there somewhere. Wolfgang [15:01.000 --> 15:09.000] is here. And Ludo is also there. I am Theo. And CripPad is developed at Xwiki in France [15:09.000 --> 15:16.000] by this small team. We have a stand here in the K building. Yeah, come by. Drop by. We [15:16.000 --> 15:18.000] have stickers. Thank you. [15:18.000 --> 15:45.000] So are there any questions? [15:45.000 --> 15:50.000] Would a peer-to-peer version be possible to reduce risks with the server and would it [15:50.000 --> 15:57.000] help? Yeah, it would be possible, theoretically. The main point why we have a server is that [15:57.000 --> 16:04.000] we want that the documents are accessible all the time. So in a peer-to-peer setting, [16:04.000 --> 16:11.000] you will firstly have the requirement that always one party must be online. And we don't [16:11.000 --> 16:15.000] want that. Yeah. [16:15.000 --> 16:18.000] So another question. [16:18.000 --> 16:24.000] Thank you very much. You say we need open source and E2E for good trust assumptions. I would [16:24.000 --> 16:32.000] suggest you might need a slightly stronger statement. The code on the server has to be [16:32.000 --> 16:40.000] open source as well. So potentially you need something like the Afaro GPL. And in terms [16:40.000 --> 16:46.000] of the E2E, you need post-quantum resistance. Yeah. [16:46.000 --> 16:54.000] If you have those two things, then maybe you have good trust assumptions. Yeah, good point. [16:54.000 --> 17:00.000] TripPad is licensed under the ATPL. So this point is easily answerable. And on the second [17:00.000 --> 17:05.000] point we are working on, we are looking on how to make TripPad secure in a post-quantum [17:05.000 --> 17:15.000] resistance. Thank you very much. You mentioned that it's problematic to have the metadata [17:15.000 --> 17:21.000] still. So what is TripPad doing against that or how can we make sure that the server is [17:21.000 --> 17:29.000] not collecting metadata? Yeah, two answers to this. One is that there's always some [17:29.000 --> 17:34.000] metadata which will be there. For example, the IP address or the browser agent. This one [17:34.000 --> 17:39.000] we have to live with. And this is also the case why it's important that you can host it [17:39.000 --> 17:47.000] on your own instance. And then the second part is that TripPad collects as few information [17:47.000 --> 17:54.000] about you as possible. So for example, we don't have a list of users, of user names. [17:54.000 --> 18:01.000] There's even no list of hashed user names. So we just hash the user. The user name and [18:01.000 --> 18:08.000] the password locally on the client side and generate from this all the keys. So this [18:08.000 --> 18:18.000] is just as an illustration how we try to ensure to have as few metadata as possible. [18:18.000 --> 18:22.000] Good afternoon. And that's the first question you answered that you don't use peer-to-peer [18:22.000 --> 18:26.000] because you want the server to be online all the time. Does the server have to be unique [18:26.000 --> 18:31.000] or can you have multiple servers just in case one gets in the hands of the police or gets [18:31.000 --> 18:38.000] out for some reason? Are you speaking of federation? Yes. Okay. Possibly there, currently there [18:38.000 --> 18:51.000] is no possibility for federation. No, sadly not. Yeah, you mentioned the case where a [18:51.000 --> 18:57.000] server was raided by police. So they have the server. Is it not enough then to have that [18:57.000 --> 19:02.000] server and also have somebody's browser history with the key in the URL and then that conversation [19:02.000 --> 19:13.000] is open? So if I got your answer, if you got your question correctly, it was if an attacker [19:13.000 --> 19:21.000] has access to the server and to the URL, then they have full knowledge. Yeah, that's the [19:21.000 --> 19:28.000] case because the URL leaks the full URL including the part after the hash, which is not sent [19:28.000 --> 19:34.000] to the server. If the attacker has this, yeah, then they have the key to the server and add [19:34.000 --> 19:44.000] the key to the document and can decrypt it. Yeah, connected it. So yes. How does editing [19:44.000 --> 19:51.000] collaborators adding removing work do like re-encrypt the file of different keys every [19:51.000 --> 19:58.000] time or how do you handle that? So we have, we only send updates. So it's not the entire [19:58.000 --> 20:06.000] file every time and it's symmetrically encrypted. And in order, there are two ways you can [20:06.000 --> 20:12.000] access a document in a read-only mode. There you have the keys for decryption and to prove [20:12.000 --> 20:20.000] that you're able to update the document, you need to sign it with a sign-in key. But the [20:20.000 --> 20:27.000] keys are static for a document. But if a user gets removed from read access, they would [20:27.000 --> 20:32.000] still be able to read the file after it's being modified, wouldn't they? [20:32.000 --> 20:48.000] Yes, exactly. Yeah. Yeah, they'll still be able to read it. There is, there are access lists [20:48.000 --> 20:53.000] which we have which can defend against this scenario. But yeah, there's also something [20:53.000 --> 21:01.000] we're working on. And maybe if I can just mention something which with more goes into [21:01.000 --> 21:07.000] these detailed questions. We just published a white paper. You can go to our website on [21:07.000 --> 21:24.000] kruppad.org and check it out. So if there are no other questions.