[00:00.000 --> 00:14.640] Well, all right. I'll get going since we're here. My name is Saul and today I'd like to [00:14.640 --> 00:19.120] talk to you about our little project P10K or how to get 10,000 participants into a [00:19.120 --> 00:21.120] GC meeting. [00:21.120 --> 00:28.400] No, it doesn't go on the loudspeakers, it's just for the recording. It is what it is. [00:28.400 --> 00:35.320] Sorry, I lost my voice. I'll try. I suppose most of you know what it is, but for those [00:35.320 --> 00:41.520] who don't, it's a way about to see compatible video conferencing application. I like to [00:41.520 --> 00:46.520] say that I can think of it in three ways. A set of open source projects that allow you [00:46.520 --> 00:51.640] to either deploy it or, you know, piecemeal it and build something with it. It's also [00:51.640 --> 00:57.900] a set of APIs and mobile SDK so you can embed it into your existing application and fully [00:57.900 --> 01:03.080] open source Apache to license and we have a pretty vibrant community that helps us build [01:03.080 --> 01:04.600] some stuff. [01:04.600 --> 01:11.840] So I've talked about scaling GC meets a couple of years ago here at FOSDOM with what we did [01:11.840 --> 01:18.960] during the pandemic. Also at Comcom about how we reached 500 participants. Then of course [01:18.960 --> 01:26.360] somebody will ask, yeah, how do you do more, right? So that's what I'm about to go on today. [01:26.360 --> 01:33.120] A quick TLDR on what the trick is to scale up is mostly to cheat because it turns out [01:33.120 --> 01:39.720] that you never see 10,000 participants at the same time. So you need to paginate and [01:39.720 --> 01:44.160] not show all of them at the same time, not load them at the same time. Also on the back [01:44.160 --> 01:50.280] end, you don't want to be, you know, taking care of 10,000 things at once. You want to [01:50.280 --> 01:55.000] be really careful avoiding re-renders on the react side of things. So on your front end, [01:55.000 --> 02:01.880] you definitely don't want to have 10,000 things. And very importantly, reducing signaling. [02:01.880 --> 02:08.160] And this is kind of the crux of the thing. So with all of those things, we ended up getting [02:08.160 --> 02:14.000] 500 participants in a single meeting. All of them are fully functional, bidirectional audio [02:14.000 --> 02:19.680] video participants. They will never all have video on. So that's sort of fine. I'm going [02:19.680 --> 02:25.200] to go a quick run through our architecture. So when we dive into XMPP, we know what we're [02:25.200 --> 02:31.240] talking about. XMPP is our course signal protocol. You heard it from Matt for chat. So all the [02:31.240 --> 02:36.720] participants join an XMPP mock, so a group chat. And then our focus, you call for negotiates [02:36.720 --> 02:42.840] a session with each participant. And then they all end up mixed in the JVB, which is [02:42.840 --> 02:49.440] where we allocate the media. So this is like a back of an app design level, but it's pretty [02:49.440 --> 02:57.600] accurate. Prosody is our XMPP server of choice. And you call for is the one that will allocate [02:57.600 --> 03:03.120] sessions here and then establish sessions with the users. So they all end up, you know, [03:03.120 --> 03:09.720] having this connection. Now, how do you go about solving 10,000 participants? Well, first [03:09.720 --> 03:16.280] of all, we do some research. And what we knew is that presence is stanza. So XMPP presence [03:16.280 --> 03:23.880] was our Achilles heel. So we needed to sort that out. And intuitively, when you need to [03:23.880 --> 03:28.480] support many of something, you think of, well, I'll partition it in smaller chunks. And maybe [03:28.480 --> 03:32.960] that's how I do it. So there is federated mark for that. So we thought maybe that's where [03:32.960 --> 03:38.040] it goes. And turns out the military had sort of researched this problem as well. And there [03:38.040 --> 03:44.200] is this cool white paper called federated multi-user chat for military deployments. [03:44.200 --> 03:50.640] And one of the things they got there is how to avoid these presence flooding. And they [03:50.640 --> 03:55.720] do that with the visitor role. And that's where we got the idea from. So the idea is [03:55.720 --> 04:01.960] that we're going to have two types of users, the active users and like passive users. So [04:01.960 --> 04:06.760] we don't need to know about all these passive users, like all these audience, we just need [04:06.760 --> 04:12.040] to know the number. We don't need to draw a tile for them. They don't need to be as [04:12.040 --> 04:15.880] apparently they're participating in the meeting. They're just viewers, right? And this is what [04:15.880 --> 04:21.960] the visitor role in XMPP Mach-Lingo means. So a passive participant can then become an [04:21.960 --> 04:26.880] active participant by switching the role. Because we're not building live streaming. [04:26.880 --> 04:31.200] So what we want to build is a way to actually actively be able to participate. Anybody of [04:31.200 --> 04:38.640] those 10,000 participants should be able to take the mic anytime. Scenarios for this, [04:38.640 --> 04:45.840] earnings calls on public to traded companies. Just because we can, you name it. [04:45.840 --> 04:51.120] So step two, how do we test it? Because if we build it, we need to be able to know we [04:51.120 --> 04:55.520] have a complete store goal. And in order to test 10,000 participants, you need, well, [04:55.520 --> 05:02.880] 10,000 participants. So we use a big ass linear grid and we created some lightweight clients [05:02.880 --> 05:08.120] so that we could have a lot of chunks that join the call. They've got no UI. We spawn [05:08.120 --> 05:13.920] multiple browser windows with multiple tabs, with multiple of these clients. And a recent [05:13.920 --> 05:19.080] trick is we use insertable streams to drop all media. One thing you can do is modify. [05:19.080 --> 05:23.600] Another thing you can do is drop it. So it's nothing. And then there are a lot more lightweight [05:23.600 --> 05:27.240] in our Selenium grid. Otherwise, it would take millions just to test what you're doing [05:27.240 --> 05:34.160] is right. There's a PR by Philip Hankey actually to do something like Chrome would said, Black [05:34.160 --> 05:41.200] Franks, very tiny ones. So maybe that's where we go in the future as well. And we also delay [05:41.200 --> 05:46.040] track creation so that we don't create tracks. If you join muted, we don't need to do the [05:46.040 --> 05:51.080] whole create a video track that is useless and things like this. [05:51.080 --> 05:56.480] The next thing is we scale the signaling. And the way we do it is we ended up having [05:56.480 --> 06:02.000] multiple processes servers. This is one node, but it could be spread to multiple nodes. [06:02.000 --> 06:07.680] So we have a main process server, which is where the active participants join the meeting. [06:07.680 --> 06:14.720] And then we have up to five extra nodes, which we call visitor nodes, where people join in [06:14.720 --> 06:20.280] this visitor role. So the presence is not broadcasted. Jigofa will decide which one [06:20.280 --> 06:25.720] you join, usually depending on the capacity. And the trick to actually become an active [06:25.720 --> 06:31.520] participant is to just join this one, join the main one afterwards. And we can do that [06:31.520 --> 06:37.440] very fast because you don't need to recreate the XMPP connection. [06:37.440 --> 06:44.520] So now, in order to establish this sort of mesh, we ended up using Federation, even though [06:44.520 --> 06:48.840] it's like within a single server, but still. So there's server to server bidirectional [06:48.840 --> 06:54.800] connections to avoid having duplicated connections. So custom modules that's where process shines [06:54.800 --> 06:59.680] because it allows us to do all these customizations to mirror like chat messages that have been [06:59.680 --> 07:05.920] typed in a visitor node to the main node and back. So to kind of fake it that they are [07:05.920 --> 07:12.080] in separate instances, actually. And as I said, becoming active is fast because you [07:12.080 --> 07:16.840] don't need to recreate the XMPP connection. You just need to join a different mock. [07:16.840 --> 07:23.000] Our step number four is to have an improved topology for media routing. Currently, we have [07:23.000 --> 07:28.120] Octo, which allows us to spread the load across multiple bridges. But this doesn't work very [07:28.120 --> 07:33.720] well for such a large load. You need a tree-style topology where some people are just receiving [07:33.720 --> 07:41.200] and a full mesh for those who are actively participating. So both loads can be spread. [07:41.200 --> 07:47.920] And last, we need to fix up the UI, let's say. So we don't need to render the visitors. [07:47.920 --> 07:55.000] We just need to know that there is 100 people and then 9,000 visitors. And that's it. So [07:55.000 --> 07:59.920] we want to refine the UI a little bit. We're thinking of using the raised-hand functionality [07:59.920 --> 08:03.760] to become an active participant. So you raise your hand, you are approved and then you become [08:03.760 --> 08:10.560] active. That's how we're thinking about it. Now, some of it is in the present, some of [08:10.560 --> 08:18.600] it is in the future. So how is it going? We got there with 51 bridges. We got 10,009 [08:18.600 --> 08:27.640] participants. So it worked out. There's still some work to do. So the UI is not yet final. [08:27.640 --> 08:33.520] We're polishing up a little bit. And we're still going to add some more modules to mirror [08:33.520 --> 08:38.400] all the data we want, like the polls and other stuff. And we're thinking that maybe we don't [08:38.400 --> 08:44.400] really need to support 500 active participants because that's a weird conference, really. [08:44.400 --> 08:48.960] So that number could actually be lower or pretty much configurable. So you can say, [08:48.960 --> 08:53.800] I want these very many active participants and the rest, it will be visitors. And that's [08:53.800 --> 08:57.880] that. And of course, we need to make it easy to deploy for everyone. Right now, this is [08:57.880 --> 09:03.200] a bit held together with that tip. Before I go, I'd like to give a shout out to the [09:03.200 --> 09:09.600] heroes that worked on the guts of this. You may know their names from our community, Boy [09:09.600 --> 09:15.920] is Domenico and Jonathan, incredible characters. And I'm relaying the message. I know they [09:15.920 --> 09:23.200] knew words, but they did the work. And I like to share the love we have for Prosody. We [09:23.200 --> 09:28.920] wouldn't have been able to do it, I think, without such a flexible piece of software. [09:28.920 --> 09:32.960] They help us. We help them. It's a very nice relationship we have with the project. We [09:32.960 --> 09:40.600] love Matt and team. So shout out to them. And since that's all I got, you can follow [09:40.600 --> 09:45.480] the progress there. We have documentation actually how to deploy the existing way of [09:45.480 --> 09:51.760] doing things. Again, early stages, but it's there. And if you have any questions, well, [09:51.760 --> 10:02.400] I'm around here. Or find me online. Thank you very much.