Thanks for the amazing introduction. And as Jan said, so we're here to talk about the future of calling and matrix. And we actually bring some pretty cool new things and there's quite a thing going on at the moment. So yeah, we really hope from now on there's finally good calling and matrix, or at least we're doing the first steps and all this will be built upon matrix RTC, which is a underlying protocol basically, which empowers all the calling in the future. And that's what we're going to talk about, how this works, how this is structured, and how calls are built on top of this. So matrix RTC is actually something a couple of you probably already has encountered in the form of element calls. So basically this is a standalone web app where you can just have calls. It's very similar to JITC, but in the background it's actually running matrix RTC. And what hasn't reached yet though is that this is really in the federated system. So what we have here, this single page application is very, very enclosed. So it's running its own home server and it doesn't federate. You can't log in with your actual matrix account. You have to have a custom account for this specific application. And the change we're going to present now is that we actually have the same technology but in our federated matrix system. So before we actually start into the interesting new things, we talk about why we even considered redesigning all of this. Because as probably all of you know, there is calling in matrix since quite some time already. It's in Niko, it's in the legacy matrix apps, element apps, and it's an element web. So why not just work on those? There are issues. For example, if you call each other at the same time, you get issues that the calls sometimes don't figure out that two people want to talk to each other. Sometimes one of your devices never stops ringing. But why not just fixing those? Oh, I see a lot of knots. That's actually super satisfying. It's really good to see that people know what I'm talking about. So why not just focus on fixing those? Why rebuilding something entirely new? And the thing is, there are some pretty fundamental limitations. So it's by design just one-to-one calls. That's just how it's designed. And it never really was in this specification designed for something bigger. It's very call specific. So you can't build any arbitrary real-time application on top of it. It's just for calls. And that's something which we think would be cool if that changes. And the signaling is done over room events. That's not necessarily a mistake, but it makes things a little slower than necessary. And also, it's really hard to get right, as we can see with ringing never stops, or we call each other at the same time, and it doesn't converge to an actual call. So this is basically our vision, what we want to achieve. So we want calls to be a great and central part of matrix via matrix RTC. And those four columns are the core things which we really want to get right. So we don't just want to have calls. We want to think beyond calls and build an expandable system that motivates also other projects. So we already had this, not with the exact stack we have right now, but something very similar. And people like Element build third room, and also Nordic build things like the NeoBoard, which are also kind of built on a similar thing than matrix RTC. And we want to make matrix RTC really a thing where it's super easy to build those kind of things. The other column which is super important is that it's using a pluggable RTC spec end. So currently that's LiveKit, and LiveKit is an amazing open project. So it really fits into matrix from a culture point of view. So it's an open system, and it really solves all very complicated issues if you use WebRTC for calling. It even ships a SFU, and it's just a very, very decent combination, like matrix for the high level signaling and LiveKit for actually doing the WebRTC shenanigans you need to go through. It actually gets quite annoying if you look into the details, and they just do an amazing job to really get this all nailed down. And then it has to support large group calls. Everything which we want to have in the future shouldn't be just for one-on-one calls. I guess that's pretty obvious. And last, we want to make it as simple as possible for other clients to support the whole infrastructure. And we already have two apps from Fremantly, like the Fremantly app itself and FluffyChat which support it. And we have Element apps which support it. And we also, we talk about this in more detail later, want to make it as easy as possible for others to also add calling. There's like a widget path you can take, and also LiveKit helps us here, because they provide pretty decent SDKs. So if we want to build calling on Matrix, we really want to leverage all the good things about Matrix. So here's like a very short, I guess I can really do this quickly because everybody knows probably, what really makes or what Matrix is really good at. So what things we really have to pass over through this real-time infrastructure. And one of this is that it's an open standard. That's like one of the things I really see as a core of Matrix. It's super cool. It's really fast and so it's great out because it's not really that surprising. Then we have Matrix encryption, which is really powerful and it goes further than just encrypting for large rooms. It also has a very good authentication and verification system. And that's a thing which I think is super essential that you not only can connect encrypted to other people, but it's also verified. So you have the guarantee that if everybody is doing this device verification correctly that all the participants are actual participants you trust and where you trust the devices are not malicious. And that is what actually makes security in the end that you don't have any weird third party being in there which shouldn't get the data streams your streaming. Then it's a federated system. So calling definitely has to go this path as well. And what Matrix is also really good is in having persistent storage. So it's not just exchanging data, it's also storing data and replicating the stored data over multiple home servers. But that kind of comes with the cost that it's not real time, real time, what we need for calling. It's more like a, yeah, more in the below second range but not millisecond range. So having those four columns in mind, how can we now use Matrix to really build up something like a system that uses the best parts of Matrix while still succeeding in actual real time? And this is done by like those three core parts. We have the Matrix part, then we have Client apps which then use the LiveKit SDK and we have the RTC infrastructure which is LiveKit in this case. So starting from the top we can see that we have just a Matrix room which can be on a federated system. And the core component in the Matrix system or what the problem Matrix solves here is that it basically stores which user is currently in which session. So if I'm joining a room and I'm reading the room state, I immediately can tell who is in which session and how to connect to those people. So I know if there's a running call and I know how to connect to them. And yeah, of course then Matrix also does a lot more sharing keys and providing the accounts and the verification. Then as the next in the center part we have the Clients themselves. So here we have a couple of Clients which have only the green box in there and then Clients with the green and the blue box and each of those boxes is basically one RTC application. So to make this example more concrete, one could think of as the green box being Element Call or just calling in general and the blue box being some document, shared document real time system or third room or whatever have you. And some of those members are just in one RTC session and some are in two and this is also something that should be possible. And then last at the bottom we have the RTC infrastructure where we primarily want to use LiveKit but it also would be possible that you use FullMesh or we have this empty box at the end. It also should be possible to basically use whatever new technologies emerging. So if web transport at some point in time replaces web RTC then you could implement a new infrastructure which then does the same high level signaling over matrix but it still uses this new technology to have even better or even higher data transmission or whatever the advantage is. So now we look into a little bit more detail for those room events. So before we had them at the top, now they're at the right. So we have room, multiple member events and each member event has an array of memberships. We need this array because as seen before we could have a call and at the same time a real time document. And the top part of the membership JSON object here is the actual core matrix RTC part. This data is just there to know how to connect to this specific peer in this RTC world. So it has this very central field Fokie active where it says the type of Fokie or the type of connection you want to use in this case live kit plus it has all the necessary information to connect to this. And this could also, this is then the part which actually can be replaced with web transport or full mesh or whatever you. And then there's another pretty important field and that's the application. So each membership has a specific application associated with it. In this case it's M call and that basically also gives the typing for all the other fields. For M call we have a call ID as well and a scope. So if it's just a call for the whole room or if it's a breakout call or whatever you want to add to the calling specification. But you can also imagine all kinds of other things. So if we think about third room one possible field could be that you have for example a country or continent in there. So when I look at the room state I can immediately tell who is in which country and based on that I know whom to connect to. So we can do very high level optimizations in this matrix RTC world already before we even connect to an SFU. What time do we have? Like, oh it doesn't say here because. Oh, okay this is fine. And we actually can talk about this as well. So this is kind of an interesting thing and it's one of 20 problems I could have chosen which we encountered which I find really interesting to just really get the mindset what those call member events are and what kind of problems we encounter in such a federated world. So it's about call history. Whenever we have a call it's of course super valuable to then see in the room history that there was a call. How long the call was, how many participants there have been in this call. And one idea, one very trivial approach would be that at the end of a call we just send a summary in the room and the summary contains all the data. How many people there were, the duration and everything. But then we encounter specific issues which are very, very common in a federated world who creates this event. Like there has to be some kind of glaring and maybe nobody feels responsible for it. Maybe the one responsible has a client which crashed at the moment where he needed to send it. Maybe two people think they're responsible because there were some state which hasn't resolved yet. And it also would be redundant data because every state event is of course also part of the DAG so it is in the history of the room. So by having another summary event we of course introduce a possible conflict where if you look through the state history you see that the call was ten minutes long but in the summary it's twelve minutes long because there was a client side back failing to calculate the proper call duration. This slide actually got broken. Either way it's still visible enough so it works. The cool thing is if we look at the call member events which we showed before it's very easy to pass those events as join or leave events. So if we look on the left hand side with the green border we can see in the unsigned field we always have the previous state of that event. So if the previous state was an empty array and the current state is an array with a membership this can be easily passed as a join event while on the right hand side with the black border we have a previous content with a membership so somebody was in some kind of RTC session and now the current content is an empty array which implies that's leave event so it's really easy to tag those events. And looking at the next slide we have a visualization of a basically timeline so the left hand side has to be interpreted as the past and the right hand side as the present and the red boxes are state event changes which we tagged as leave events with the system we used before and the green boxes are state event changes which we tagged as join events. So if we go through a very simple example, member three for example they just had no changes at all so during the whole period which is shown on screen they were no member. If we look at member two in the past they were no membership then they had a join event so from that point on they were in a membership and then a leave event. So if we now run an algorithm locally that we start from the current or the present and we just go back collect all the leave and join events we can basically recreate the call state. So at each point we know who was joined and who wasn't and then we just loop through this algorithm until we find a point where nobody was joined and that then is of course the start so this slide indicated with green border so we have then all the information we need we have the start, we have the end, we even have the number of participants who joined we basically even have a heat map at which time there were how many participants like there's lots of data in there and each client can decide on their own what exactly they want to do with it and how they want to render this in the timeline. So yeah this is your part now. Who on time? Thank you Timo. So now we are going to look at implementing because well client implementers also need help and if you are one of those people whose client already has the WebRTC parts implemented you might be thinking ah shit I need to throw away all of the stuff I've already done. Not really. So Timo showed this already but there's this small RTC infrastructure bit which we are going to look into a bit. This is MSC3401, well kind of MSC3401 the m.call event has already been removed because it caused way too many glares and stuff if you want to know more about that you should watch Timo's matrix community summit talk about why the m.call had no ownership and stuff so it caused way too many glitches. The first half is just the matrix RTC stuff which Timo already mentioned about. The participants send the member events so the room has a history of who joined when and now if you don't have an SFU you could just say the infrastructure or the foci or the back end in the matrix RTC as mesh and then you can potentially just use the P2P MSCs which were already implemented by you or hopefully will be implemented for a mesh call and a mesh call is basically a P2P call between multiple participants. It's just not as scalable as you would think but now you can use your existing MSCs, your existing implementation for mesh calls and you don't even need an SFU or something but if you are rich and if you do want to set up an SFU then it gets much simpler. SFU in our case will be LiveKit but all of the signaling bits are now handled by LiveKit itself over WebSockets. The previous thing was over two device matrix events. The first half is the same but basically all of the signaling part is now handled by LiveKit over WebSockets. More about LiveKit I'm going to keep saying that SFUs are cool but SFUs are also very expensive and if you don't want anyone else to use your SFU you probably want to have some authentication in front of it. So if you are a home server owner admin and you also host SFU then you probably also will be hosting a JWT service which basically gets an open ID token from your Synapse server. You send it to your service. The service then validates if you are the actual one who generated that token and well then it generates a JWT token for you which you can use to authenticate with LiveKit SFU. Right now I believe that the Synapse thingy only checks if you are the actual one who generated the open ID token but I think there's already work going on for checking if you are actually in the room so only people who are in the room and if you want to actually join that room only then you can get access to the SFU. Some fancy stats. The LiveKit docs say that with around a 16 core Google virtual machine you can have calls with around 150 members. This is I believe 720p no simulcast just draw 720p 150 members feeds. From my personal testing I used a Hexner CAX21 well not personal but family gave that to me but it's a four shared VCPUs ARM core thingy and I could get around 70 participants with simulcast and 720p everything optimized I think. Ringing. You might not think ringing is important but ringing is actually very difficult to get. Mainly because native operating systems are not really friendly with you and will try to kill your app every possible second. So they started a GSOC project by the way. GSOC 2022 project at matrix. It's basically a three month window and you have to do this particular task. Well my task was actually implementing the whole WebRTC thing but I implemented the whole WebRTC thing in two weeks and for the next two and a half months I had to fight ringing. You need to focus on three cases. Your app could be in foreground, background and terminated. By ringing I basically mean if your app is in one of these three cases you need to be able to somehow ring the application when you get a call. Pivot it three times and we'll see the three ways. This is a story yes but hopefully client implementers can learn from this. This is the coolest part which I wanted to show at FOSTA. I did not know you could do this. This is Android specific. As far as I know only has one way you can do this. That's using Colkit which is the phone dialer app on the iOS thingy. I think WhatsApp also uses that but turns out Android also has a way to do that. It's called telecom manager or the connection service API and what you're seeing on your screen right now is the Samsung OEM dialer application and what the telecom manager allows you to do is put any wipe call from your application to the dialer so you don't really have to handle all of the OS killing your app and stuff because the dialer already has that. Then you get this fancy UI. You see all of these buttons, the hold call, Bluetooth, even the merge button works and I didn't have to do that. You also don't have to implement a new UI for all of those holding calls or you have another call when you're in another call. This was very cool but why this could not be implemented? For this you need to add your app as a calling account in your dialer app and that is a very hidden setting. I could not find a way to programmatically do it and also in some of the regions it's just blocked. It's apparently a regional thing so this could not get in. Frustrated by that I went to try to. Where? We just hack it. Apparently Android has two very nice thingies. Show on lock screen and the up here on top thingy. What we basically do is apparently we're out of, well running out of time so this is going to be super fast. We just call the up here on top thingy which then brings your app on the top and then you can use the show on lock screen. Even if your app is foreground or background and your screen is locked you could potentially just hack the app to get live and stuff. It does not work on terminated apps and no way my coworkers would have let me merge this thingy. Try three. Fine. We'll do it the right way. By the way, if you are thinking this is an obvious solution this was not obvious for me because family and Fluffy chat are written in Flutter and when I get a notification I would have to start the right Android bits then start the right Flutter bits and then decrypt the event and then show the ringing too much work. But well turns out after two tries I found out that push notifications already do that for you. Well, so we just abuse that now. You use the Firebase push thingy or the unified push thingy. They start a worker for you. They bring up the Flutter engine. A Flutter engine is basically something which is attached to your Android activity. Once the Flutter engine has started you can just hook on to that. You can hook on a VoIP listener to that and then kind of abuse it to see if there's an invite event coming in and then you show your own UI. That works. I hope that's the right way to do it. Please tell me if that is not. By the way, if you, like I said, I use the m.call.invite thingies for the thing now. But that's not a thing with LiveKit because all of the LiveKit stuff happens on WebSockets. So there's a new MSC for that. With this you can basically, this uses intentional mentions so you don't spam your whole room with your notifications. But you can specify which user IDs you want to spam, ring, and which, what your notification type is. It could either be a ring or a notification, yes. S-Frame key sharing. No time, but SFUs need another lock because WebRTC and said the SFU stuff uses S-Frames secure. Trust me, bro. Cascading. Yes. Right now your calls are, well, right now the calls are technically federated. So you could potentially have a call inside a room with SFU one and you could have a call inside another room with SFU two. The only main limitation right now is that all of your participants who want to be on a call need to be connected to the same SFU. With this you can also have like secure deployments where you basically just have the left half and then all of your communication is within your organization just for the local network, etc., etc. But in the ideal future what we want is cross SFU communication where every home server could have its own SFU and their JWT service, then all of the users from that home server connect to their own SFU and then the SFUs cross-federate, everything is federated. Yay. This is already a thing by the way, but it's a proprietary thing in LiveKit. So maybe if someone from LiveKit is watching, please open source it so Matrix can use it. Probably not going to happen. And how you implement this? The easy, there's two ways. You can either implement element call in the widget mode. I believe there's two SDKs right now, the Rust SDK and the React SDK, which already support widgets. So you can just use the iframe in your app, looking at you fractal people, do it already. And if you don't unfortunately support the widget API, well then you have to go the hard way. You need to implement it using the native LiveKit SDKs and, well, LiveKit has a lot of SDKs. The Flutter, Android, Swift, Rust, obviously Rust is there. Yeah, that's it. Thank you. Demos. By the way, if you can join this demo, you, Timo, I think they can use develop.relement.io. Yes. Basically maybe you go ahead and show the... Ah yes. Yeah, but so they can sign in. You can either use... You can either use... I should have written this down. You can either use develop.element.io or td-family.github.io slash fluffychat. I promise you this is not a phishing attempt. I can show you the CI run from what I deployed it. But well, and once you go there, just type in this alias and then you should pop up in a room and you can join a call with us. Could you repeat the URL? This is the URL. It is... Yes. Timo, do you want to start it now? Yeah. So basically, can people hear me if I talk without the microphone or... Okay, then you just have to talk with them. I'm talking. Okay, perfect. Yeah, then I can also talk. So basically what I just did is start a call. And the cool thing now is that we really have inelement web, inelement x, and infamately or fluffychat, we have the full new matrix RTC stack implemented. So all of them are able to... Can you hear me? Yeah. I can hear some weird sounds. So all of them can talk with this new stack. So you have to go to develop.element.io and there is a feature flag there. Oh, to be in the camera, makes sense. But in general, this is like the big new thing now that everybody can without doing something highly crazy, just go to develop, activate the new group call experience and then still and then be able to use the new calls. So basically what I just did is start a call, but I think I did a private call. So that's why I did the ringing as well. So I am joining here. And TD now... Someone's already in the call. Yeah, this was me just joining there. And I think maybe Kim is in there already. Oh, there is multiple people. Interesting. Well, that's element for you. You have been seeing this for months now, but now we go to the fancy thing, fluffy chat. This started a month ago, so this is probably ridiculed with bugs, but well, if it works yes. Kaboom. Nice. So this is really super, super cool that TD managed to get like in record time. Fluffy chat into a state where we have again a federated multi client system with calling with group calls. So yeah, this is one of the first few multi client... I think it's the third time we do it now. Multi client federated matrix RTC call. With screen sharing apparently. Questions? Do you guys want to break it? How many people can still join? Oh, we are doing a test. Might as well. Yes. Yeah, are there any questions? Does LiveKit send any emails back to say who's talking? Oh, yeah, there's actually lots going on on LiveKit. Oh, okay. So the question was if LiveKit is sending any signaling back to let us know who's talking and yeah, probably also who's showing video. And there's lots of things LiveKit does, so it's actually pretty sophisticated in that regard. And even there's things like if I upstream video, but nobody's consuming my video. Like let's say we have a conference of 100 people and everybody has me at the bottom, and LiveKit is communicating to my client that I don't even have to upload video anymore. And that doesn't only work with upload video and don't upload video, that even works with a resolution. So basically if lots of people consume me in just a tiny thumbnail, then my client automatically notices that I only have to stream the thumbnail. So there's like lots of optimization happening that at the end from a receiver point of view you basically just download what you actually see. And from a streamer point of view, you also only upload what people actually need to see. Yes? Who holds this LiveKit? You said that this is fully federated, but maybe I somehow missed the point where we talked about whose LiveKit service is used. Because in the previous iteration with full mesh, I thought the cool thing were that multiple MaTvic servers are involved, also multiple SFUs or whatever are involved. Now it seems like it's maybe the LiveKit server for the first one initiated or something. Yeah, so basically. This is kind of two questions. The first part was who's hosting the LiveKit server, where are they coming from? If it's federated, there should be like, yeah, same similar to MaTvic server, multiple servers, and that's exactly what's happening. So the idea is that in the future it becomes very, very common that next to your MaTvic home server you also host a LiveKit SFU. It's kind of similar to that lots of people also host a turn server right next to their MaTvic server. And then the second part of the question was how do we decide which SFU do we use? And of course, like what TD presented at the end, where you have the option that SFUs talk to each other, there you would just always connect to the SFU of your home server. And if they're federated participants, the SFUs in between each other would figure it out. Now there's actually a system that the first one, exactly how you also initiated it or presented it, the first one who's joining defines in their member event which LiveKit SFU to use, and then everybody's jumping on that SFU. And since that means if the first one is leaving and maybe others are joining, but they have a mistake that they put the wrong or different LiveKit SFU into their member event, we even have real time switching from SFUs. So it's not, I think it's a one second interruption you get, but it still works really well that if the first one is joining with SFUA, the second person has SFUB in there, then the first one is leaving the call, everybody's immediately switching to the SFU from the oldest participants. But I guess it's quite obvious this is mostly a workaround until we get to the point where the SFUs in between each other can exchange the streams directly, that would be, of course, much more elegant than we don't need this anymore. But for now this is exactly how it works, so we can always guarantee, because that's a very simple glaring algorithm, just take the oldest member state event, call member state event, that we can always guarantee everyone is on the same SFU, which is quite important for a call, of course. Does that answer the question? Yes, always. Do you see any technical difficulties with having recording or transcripts? So the question is about recording and transcripts, and if there's technical difficulties around this. So basically since this is matrix, the ideal and easiest to cross approach, or UX, however we want to call it, would be that those kind of things just happen as bots. So, or recording would happen as bots, where you can easily just have a recording bot, they are just another participant, they are part of the room, they get into the key sharing, so it's very transparent for everybody that it's not just those participants, but also the bot receiving the streams, and then this bot would take care of recording. And since it's all based on LiveKit and LiveKit is a very, very good infrastructure already, there are amazing tools for this, so recording should be fairly straightforward. The transcript question, which was also asked, that is basically an implementation discussion. You could also have a bot, and then the bot could stream the data into a data channel, or the bot could stream the data directly into the room, because it's then part of the room, or you could say you don't want any bot to get the data, and you want to run local systems, which do the transcription, and then just, yeah, do it locally, like there are multiple solutions for this. I guess we'll see what the future brings. This is amazing. I think somebody just joined the room with a, oh, but it's just unmuted. I thought that is somebody already having implemented recording and now playing live. That would have been so cool. I just got super excited, but I guess this is just my echo. Any other questions? So basically the current state is that it's just on develop, but it is ready to try out. I think this is actually something, can you show the path to activate the new group call feature in Element? So if you go to develop, there is one feature flex, so for now you will only get the option to do jitzy calls and legacy calls, but if you want to have the new matrix RTC calls or Element calls, you need to go into the settings and then feature flex, and there's a flag called, yeah, new group call experience, and if you turn this on, and on the sending and receiving client, it should all work, and on Element X, like the mobile client, Android and iOS, it also should just work. Like there you don't even have to activate a feature flex. You just go into the room, press join, and it should end up in the same room there as well. Actually, that's a part of the demo we could just do, right? Do you just want to join with that user? I think, yeah, this is actually also a thing we can show. It's basically easily also possible to have multiple devices per user, so that basically implies we have simple continuity, so I was connected with this computer, and now I just connected here. Oh, I need to read this. It's dangerous. So I'm connected here as well, and now you can't see any streams, right? It does show streams on my computer. Maybe they will recover. I mean, yeah, seems to not work, but it works on this computer. I mean, I can turn it around, so at least the first row can be convinced that it's actually showing this stream right here. So if I would hang up here, I basically did a continuity to move the call from here to here. Oh, and this is also pretty interesting. I'm not sure probably no one can see it, because it's just on the screen, but Paul has joined with an older version of Element X, and currently, if you're in an unencrypted room, you will stream unencrypted media. If you're in an encrypted room, you will have per sender encryption, and that's a part where TD kind of rushed over. So basically, if you have an older version of Element X, this isn't considered yet, and since this is an unencrypted room, if you join with an older Element X, you still stream encrypted data, but my client doesn't expect encrypted data, so that's why it's giving me all kinds of noise. So basically, this is proof that it's actually encrypted. So what TD said is... Trust me, bro. It's always super hard to demo an encrypted call, but here we are. We managed to break it, and there you can actually see that it's encrypted, which is... And the only reason this isn't an encrypted demo today is because we have different encrypt implementations of that. I believe Element uses room events, and I decided to use two device events, because why not? But this will get figured out once we start drafting MSCs and stuff. Exactly. Last question? All questions answered. Cool. Thank you so much.