So there may or may not be time for questions. There's a lot of detail. This is a 60 minute
talk compressed down to hopefully 22ish minutes. So we will see how we go. But yeah, I'm here
to talk about the technical details of interoperability. I'm also Travis. If you don't know me, I'm
the director for standards development at matrix.org. I'm also on the spec core team.
I run T2Bot and I work at element for trust and safety. I have a few jobs. But good news,
there's already more that we can talk about. So Matthew had the talk this morning. If you
haven't seen that or seen the recap of it about 10 minutes ago, covers DMA and the timelines
in a lot more detail. To recap though, the DMA requires gatekeepers or large messaging
providers to open up their APIs and their systems for interoperability. Encryption must be
maintained between those providers. So you cannot break encryption for the sake of interoperating.
You have to maintain it. These messengers have three options. You can become multi-headed
similar to Beeper Mini where you have all the networks available in your one client.
And you just kind of switch between them. You can create a bridge app where the user
downloads a third thing and then you bridge locally on the device. That works. It's not
great. Or you can speak a common protocol. We've been doing that for the last year.
Probably longer. And oh yeah, they have to do this all by March 7th this year. So with
that in mind, there are many projects involved as well. There is the more instant messaging,
interoperability working group or MIMI at the IETF. They are trying to specify a standard
that does this stuff. We are very frequent there. We are a direct contributor to this.
I have written a MIMI protocol document in association with a few other people on the
design team to try and simplify a lot of the components, particularly what linearized matrix
is. Also, linearized matrix was originally created as this simplified version of matrix
because it turns out that you don't necessarily need a ton of the fully compatible DAG stuff
or even messaging history for interoperability. A lot of the existing providers just kind of
want to throw messages around the place. They don't necessarily want to just kind of keep
these things around. Obviously, of course, we have matrix. Hopefully, everybody here is familiar
with that. But it is the decentralized and fully featured version of an interoperable protocol.
What parts of interoperability do we have to worry about? A few. There is encryption. This
kind of fits into a weird L shape. You have content format within that. But the encryption,
we have to make sure that all the messages are secure. We have to make sure that everything is
the same. Of course, we have to make sure that it is consistent across the providers. The content
format, what do messages actually look like? We have to make sure that that is the same because
the servers can't help us here. The clients have to agree on this. That is more of a challenge.
We also need an authorization policy so people can be banned because they need to. Then we also
have messages that people might not be allowed to send in certain rooms. Of course, we also have
transport. The transport is just how the servers communicate because we have a room model that
looks something like this. The room model is a combination of the encryption authorization
policy and transport. We also have a definition of membership or participation, a little bit more
on that in a minute. And also how the messages are found out themselves. In the very simplest
scenario, we have clients talking to servers, servers talking to each other, and encrypted
messages flowing between clients effectively. It gets more complicated when you add a third server,
so we will do that later. Some of these problems are easier than others. Namely, transport. Super
easy to solve. Pretty much everybody uses some form of HTTPS. Mimi wants to use MTLS. Linearized
Matrix uses the same system that Matrix already does where you have a signed or a signing key that
kind of gets thrown around a bit. It is unclear what the actual format over HTTP would be. Matrix
uses JSON. Mimi wants to use some form of binary. Unclear what that actually is. We are also
considering a binary event format specifically for this kind of thing. Protobuf and Seabor are
kind of on the top. But to be determined, clients would not be expected to consume that binary
format yet. I should probably just add that in. But yeah, we will end up using some sort of binary
over HTTPS mechanism authorization exactly to be determined later. The other easy thing is
authorization policy. Mimi does not define one. We have been working without one. We have just
been assuming that people are able to send messages. Matrix obviously has one. Role-based access
control is super popular amongst a lot of these discussions. There is those two MSCs there.
4056 covers the decentralization part of RBAC. Then you also have 2812 where it basically
rolls as state events. It is an early form of RBAC. Linearized Matrix uses the existing
authorization rules. Matrix authorization rules clearly already work. People have been using them
for almost a decade now. They should be fined. We will figure out what Mimi ends up with
eventually, hopefully. The harder parts are encryption. Most messaging providers use lib
signal or something that is a double ratchet. We also have a double ratchet-like implementation
called OM. It was not previously interoperable with lib signal up until about 2 a.m. tonight. We
now have inter-OM, which has that X3DH support as well as some of the other delts you need to be
able to support that sort of interoperability. Megalom is what we use in group chats to try and
alleviate the load. Otherwise, with OM, you have to send a number of events for the number of devices
in the room, which obviously causes problems when you have multiple devices per user or multiple
users in a room. Matrix HQ would be a nightmare. The double ratchet does rely on existing
infrastructure in order to send keys. It has no concept of membership. It does not know who to
send the keys to on its own. You have to tell it who to encrypt to and then also send those keys
yourself. Some messaging providers, namely Google, have announced that they will be using MLS. We
also obviously want to use MLS. REMLST.com is where we're tracking that progress. MLS does have
a concept of built-in membership, so it does know who it needs to send messages to. It obviously
doesn't send the messages itself, but more on that in a second, namely this slide. RFC 942.0,
that is where the IETF has specified this. I have a really awful crash course guide because I am not
a cryptographer, but there it is. But yeah, there is a binary tree, so you have a root key and you
have multiple nodes underneath that. With that concept, you end up with a concept of membership
where only users or members that have certain keys can see other keys. That is how you get to
know who to send the keys to, particularly the decryption keys. Mimi has refused to implement
any other encryption other than MLS. They are obviously considering it as part of double
ratchet because we do need an onramp. But with the IETF, they tend to get a little bit stuck in
the RFCs. We are also considering MLS, obviously, and so we want to extend it. Decentralized
environments, namely matrix, will have to use DMLS or similar. Membership. As part of the
discussions with Mimi, we have been having some arguments, we will say, about what it actually
means to define membership. We have decided that users join rooms and clients encrypt messages.
Both MLS and double ratchet deal with clients. When a user joins the room, all of their clients
join as well. This is hopefully not a novel thing that is here, but it is written in stone now.
So we need to synchronize these two concepts. We call users to have a participation state or
exist on a participation list. And then clients have membership. So users, participation,
clients, membership. We also have to make sure that these are atomic operations because otherwise
somebody joins the crypto state, but they are not part of the actual user state. That causes issues.
So Mimi has started proposing a bunch of MLS extensions to persist application
state within an MLS group. Because MLS has those extensions that you can just store arbitrary
things, making the blob even larger so you must store it in the media repo. These are new as of
like a week and a half ago, but it is called AppSync. It is a generic mechanism. Conveniently,
it would basically be mapped to state events in matrix. So you can just add arbitrary information
to the group, namely with a key and some sort of content. And then there are some operations
that apply where you can add, remove, update, that sort of stuff. But yeah, it is visible to servers,
but servers can't see the actual encrypted messages part of MLS. They can just see that
state changes are happening and potentially what's inside those state changes, which is why they
would map to state events in matrix. Double ratchet and participation is a bit harder.
Because double ratchet, again, doesn't have a concept of membership. It's not terribly difficult
to map these. It's a little complicated sometimes. So there's a couple of MSCs there that list this
sort of information, namely the crypto IDs Matthew was just talking about. And then yeah, we translate
these concepts to Mroom member state events as well as device lists on matrix. But regardless of
the protocol, we want to make sure that people currently on double ratchet have a way up to MLS.
So it's a natural evolution of the application rather than forcing somebody to effectively fork
their own client, which brings us a little bit into content format. So clients need to end up
encrypting and decrypting the same thing. Otherwise, there's going to be issues. Because
if you send a text message to somebody and they just don't know what to expect,
then there's not going to see anything. So we need some form of extensibility because
messaging also has a ton of features. And it's constantly evolving. Servers can't help with
this because it's already encrypted. And of course, it should be as small as possible. It
should require minimal processing power because not every client is a laptop. Or sometimes the
laptop is a bit slow. So Mimi has worked on their own TLS encoded multi part MIME format. It looks
a lot like multi part email. It's not the greatest, but it is a notional format while we try and work
out the exact things. But matrix already has events and you can already define your own custom event
types. And you can already add arbitrary content. But what if we made that way more extensible?
So we introduce extensible events or MSC 1767. We use content blocks to persist information inside
of an event. We specify the course blocks there. And then we also try to make sure that the
client can render arbitrary event types that they don't know about. So we lose a little bit of
richness in the sense that if a client does encounter an unknown event, that they have to
figure out how to render that. And it might not render in the same way for everybody, but at
least render the same information for everybody. And that's the critical part. So an extensible
event looks a little bit like this. This is just a basic text message saying hello world. So if
your client supports HTML, it picks the HTML format. If it doesn't support HTML, supports the
basic format. But critically, you have a type of m dot message and you have a content block of m dot
text. So if we add a little bit more richness to that and create a fake schema for polls that
definitely doesn't exist, please see the MSC for a real schema. You have an unknown event type for
some clients, namely org matrix poll start. So you still have that text content block. And then
you also have this poll content block, which gives you a little bit more information about how to
render these events. So if your client knows what that event type is, I can go into the content,
pull out the org matrix poll content block, render that in its UI, and then the client can
interact with it normally. Otherwise, you end up with just the text and it is suitably okay. It's
not great. But you still have the same information from the poll. And so yeah, currently extensible
events are JSON. But again, you could make this a binary format in the future. More events get
rendered by more clients, which is great. You can create more custom event types. You can do all
sorts of fun stuff to be determined exactly what all of this looks like. We're still in the process
of specifying all of the pieces, particularly the core content blocks, and also a registry so you
can actually implement a client that understands all of these things. So a little bit on room models.
The Mimi room model looks like this. So when you add the third server, there's obviously a little
bit more complexity. Mimi primarily uses a hub and spoke fan out. So you have one central server
per conversation, not for the entire global network, that is responsible for distributing
messages. So server B and C try to avoid talking to each other if they absolutely can. And they
talk through server A instead. So server A is responsible for sequencing, which is important
for MLS. It has those characteristics in play. And then yeah, the follower servers, as they're
called, go through that. And encrypted messages still flow between the clients as normal. The servers
can't see those messages. So then we have the question of what does linearized matrix look like?
It's exactly the same thing, just different objects, which is particularly interesting when it comes
to the fact that it was rejected. Because it uses just regular matrix events. It's the same room
state. It's the same matrix event stuff. It's a stripped down version of the server to server
API because you don't need all the DAG resolution stuff if you don't have a DAG. Also, your DAG
is now a linked list. So you don't have any state resolution to do. You have the same authorization
rules. You can use the same extensible algorithms for encryption. You can use MLS, double ratchet,
your own thing if you're insane enough to do that. And then you have all of the same capabilities of
matrix. And you have the history and all of that. But critically, you can support having a DAG capable
server in the room. You don't need to give up your decentralization. You can end up with a hub
server that basically acts as that linearization algorithm or does linearization algorithm. And
it also still persists the events, still distributes them. So when you get into decentralization,
namely how matrix works, you use a DAG. You have full mesh fan out where each server contacts
every other server instead of going through a central hub. Conflicts of the DAG are used or
done through state resolution. So if two people try to do the same thing, somebody has to win.
And the good news is state resolution can also be used to linearize the DAG. So through use of a
protocol converter, which may or may not be a dual stack server, you can then bring these centralized
systems, even linearized matrix into matrix to just further route them. So protocol conversion,
they aren't bridges. Bridges somewhat necessar- they're necessarily break the encryption because
when you're converting to signal to matrix prior to our existing or to our new interoperability
capabilities, you end up decrypting the network on both sides of the bridge and re-encrypting. So
you're only really encrypting to the bridge and not beyond it. So protocol converter doesn't
decrypt messages. It just converts the envelope format to another format. So that way you can
just keep sending your messages. This may also include translating some of the concepts. For
matrix, we have two device events, some other protocols, namely Mimi, just send everything
over what they call events. So we would have to translate those concepts into the appropriate
matrix APIs. Again, you can make this either with an app service or as a dual stack home server.
So instead of having a multi-head messenger, you have a multi-head server. And then, yeah,
use msc3983 or 3984 to bridge the particular crypto concepts if your server doesn't necessarily
support those key formats. So this is what it looks like. You may have recognized it. I stole
it from Matthew's slides. So if you have a gatekeeper on the left there, you can do a protocol
conversion. And that might be attached to a single server. It runs through matrix. And then you run
another protocol conversion to bring it into linearized matrix or Mimi, where you have that
hub and spoke, namely that the bottom two servers there aren't talking to each other directly.
So those two nodes might be the same physical server, just running dual stack and not doing
protocol conversion. But that's all right. So there are a few missing pieces. We haven't talked
about anything to do with identity. How do you convert a phone number or a name or an email
address into something routable? Who knows? That needs to be defined. We currently have identity
servers in matrix. They're a bit centralized. We're hoping that somebody in Mimi can actually
solve this problem for us. We also have an interesting idea around consent. Presumably,
you don't want to receive spam. So how do you make sure that the person that is messaging you
is allowed to message you? We also have anti-abuse. How do you report these messages over federations
or over servers? How do you make sure that the servers can implement their own anti-abuse measures
using whatever identifiers they can? Mimi also is not necessarily defined the exact identifiers
that they want to use. Matrix already has user IDs, room IDs, aliases, that sort of stuff. But
who knows? Maybe something different would work. So room metadata. Again, where does the
room name go? Who knows? We'll have to figure that out. Matrix state events would probably be fine.
Same thing with ordering. MLS requires ordering. There's a discussion around whether or not the
clients also need that ordering. So what's next? We have no idea. As Matthew has mentioned, again,
I'm just stealing from his slides. So linearized matrix will probably get updated as an MSC
because currently the MSC is one version behind from the IETF draft. And the gatekeepers will
have to publish their plans by March 7th. We'll see what happens there. The protocol converter
concept will continue to be refined, of course. Mimi will also make some form of progress,
hopefully get refined as well. And yeah, funding the foundation is the best way to make this work.
So, questions. Yes.
What are the stakeholders in the Mimi and why are so different stakeholders, like,
not using the matrix approach? And what are the different interests here?
Yeah, so the question is what are the different stakeholders and why are we going after
certain approaches, I believe. So there are several players in the Mimi space. So we have
obviously ourselves. We also have wire. There's Google and I'm forgetting all of the other ones,
but there's... Yeah, Cisco, Wicker, Phoenix, and a few others. There's a few hundred people in
the Mimi working group. You can see their company association as part of the membership list.
I would suggest going there. As for the different approaches, everybody wants
everybody to use their thing. We're no exception. We just think that ours is better.
But yeah, we've been doing this for a while. Matrix was originally built as an interoperable
protocol. And here we are with a legal requirement to have interoperability. So, surely Matrix is
designed for that, is kind of our thought.
We used to rely heavily on canonical JSON to maintain the
technicality of the company. How does that translate to the Mimi particular and get the
intracorrel? Yeah, so the question is how... Like we've previously relied on canonical JSON.
How does that translate to Mimi and just general approaches with interoperability? So,
canonical JSON has all sorts of interesting issues with it. What happens if you have multiple
keys? What happens if the keys use a weird former of UTF-8? That sort of stuff. It's a very complicated
set of rules that can realistically never be fully defined. So with a binary format, namely,
that's what Mimi's interested in, you don't necessarily need a canonicalization,
because if you keep the signature for the event next to the event, rather than in the event,
like we currently have in Matrix, you are able to just sign the series of bytes. And the bytes
can be in whatever order. You can deserialize them, see them more easily, and then check the signature
much faster. So that's kind of where the Mimi direction is going, is we want to avoid a canonicalization
algorithm, but we do need the more specific standard for what's contained in those bytes.
This is something to be supported either throughout the chain,
yes, we are going to be pushing more towards keeping the, instead of trying to make everybody use
the existing matrix thing, I would suggest that matrix kind of adopt more of that binary event
signing instead. Yes.
You had a slide with things you didn't talk about?
Yes.
In many places, primarily in the Mimi working group, that's where a lot of these conversations
are happening, as well as on the design team for Mimi. But if you are interested in them,
or you have ideas, feel free to pop by the Matrix spec room on Matrix, and we'll be happy to engage.
Do I have time for one more question? All right.
Yes. All right. So how do we avoid, basically if you have two protocol converters,
say they're both talking to the same network, how do you avoid message duplication?
Good question. We'll have to experiment with it. We will be trying to
figure out exactly what that looks like. We kind of have to wait until March 7th to see
what the actual gatekeepers, namely WhatsApp and Facebook Messenger, have to offer for that
certain capability. Thank you, Travis. Thank you.