[00:00.000 --> 00:16.160] And so, now we have two presenters, Benjamin Dekoshny and Morgan Reschenberger, so Benjamin [00:16.160 --> 00:21.760] is a member of the Mozilla performance engineering team and Morgan is a senior software engineer [00:21.760 --> 00:26.400] working on platform accessibility and Morgan if you want to repeat your name probably with [00:26.400 --> 00:33.000] my pronunciation. No, you got it, you got it. Morgan Reschenberger, that's me. Yeah, we're [00:33.000 --> 00:38.360] going to talk to you about an accessibility project called Cash the World and the way [00:38.360 --> 00:47.120] that we're monitoring and measuring performance. So, I'm Benjamin, I am on the performance [00:47.120 --> 00:54.160] team and I'm going to be talking about the collaboration from the performance side. And [00:54.160 --> 00:57.360] I'm Morgan, I'm on the accessibility team. I'm going to talk about the accessibility [00:57.360 --> 01:02.120] side. We put the matrix rooms for both of our teams here, so if you have topic-related [01:02.120 --> 01:08.640] questions after this, you can follow up there. So, here's the agenda. We're going to just [01:08.640 --> 01:15.360] talk a little bit about accessibility in Firefox. Morgan is going to go through intro [01:15.360 --> 01:21.360] to the rendering and accessibility architecture and some of the changes that happened with [01:21.360 --> 01:25.800] Cash the World. I'm going to talk a little bit about how we're measuring performance [01:25.800 --> 01:31.600] and some of those questions and current problems. We're going to go through our future work [01:31.600 --> 01:39.200] plans and then we're going to open it up for questions. So, the first thing is scoping [01:39.200 --> 01:46.000] context for accessibility in Firefox. The goal is, of course, a faster accessibility [01:46.000 --> 01:52.840] engine and more performant web use for users, all users and especially users using accessible [01:52.840 --> 01:58.160] technologies. We also want to try to create a performance testing infrastructure that [01:58.160 --> 02:05.920] will be able to prove these things and test the more we change our internal infrastructures [02:05.920 --> 02:13.960] we want to be able to make sure that we can catch problems. We also wanted to establish [02:13.960 --> 02:19.400] some accessibility metrics and we want to work in public with public dashboards that [02:19.400 --> 02:28.560] show the kind of performance that we're getting. We want to improve our documentation. We want [02:28.560 --> 02:36.480] to improve the debug experience. And as such, we're going to talk a little bit later about [02:36.480 --> 02:41.160] the profile markers that Nazim talked about earlier, but specifically the accessibility [02:41.160 --> 02:50.320] problems, and we want to set up infrastructure for collaboration. So, scope on this is we're [02:50.320 --> 02:54.560] going to be talking about screen readers pretty much only, and we're not going to be talking [02:54.560 --> 02:59.840] about any of these other accessibility technologies like screen magnification, contrast modes, [02:59.840 --> 03:06.280] on-screen keyboards, subtitles, any of that. That's all deferred till later in this work. [03:06.280 --> 03:14.680] So context for Firefox and accessible technologies is not great from the free software perspective. [03:14.680 --> 03:21.600] Almost all our users are on Windows, and then you have a very small sliver of Mac and Linux, [03:21.600 --> 03:27.680] and Linux is like under a percent. We just have to know where we are, and that's where [03:27.680 --> 03:36.360] we are. In general, 5.5 percent of all Firefox page loads for the month of January had some [03:36.360 --> 03:42.720] accessible technology built in, and that's not evenly distributed across the OSs. We [03:42.720 --> 03:52.440] see a much higher use on Windows, and Linux isn't bad, orca, yay. And then Mac is far [03:52.440 --> 03:57.520] below that. But for the most part, if we were talking about who is touching this work and [03:57.520 --> 04:02.400] who do we have to care about, it's these Windows users. [04:02.400 --> 04:09.800] And then here, just for a little bit more context about, like, in that 5.5 percent of page loads [04:09.800 --> 04:14.040] that use accessible technologies, like, what accessible technologies are they using? They're [04:14.040 --> 04:24.000] using mostly screen magnifiers, which is the black line, and then the purple line is speech [04:24.000 --> 04:29.920] rec in general, and then underneath that is NVDA, which is the Windows screen reader. [04:29.920 --> 04:37.000] So those are the top three that we really have to care about. [04:37.000 --> 04:38.000] Morgan? [04:38.000 --> 04:43.080] And so before we get into all the details about the performance work, I want to give you some [04:43.080 --> 04:48.440] background on how rendering works in web browsers and how it translates to the accessibility [04:48.440 --> 04:51.080] architecture that we're going to be talking about today. [04:51.080 --> 04:56.800] So the general job of a web browser is to convert HTML and CSS written by web authors [04:56.800 --> 05:02.280] into visual navigable content, right? And we do this through a rendering engine in Firefox. [05:02.280 --> 05:07.800] This is called Gekko. It has five different phases and stages that produce artifacts that [05:07.800 --> 05:12.600] are used in the following phases and stages. So first we parse the HTML document. This [05:12.600 --> 05:17.920] creates the DOM or document object model, which is a hierarchical view of the web page. [05:17.920 --> 05:22.840] Then we look at the CSS and figure out the style information for each node, what visual [05:22.840 --> 05:28.080] changes we need to make when we render. Then we do layout, which computes positional and [05:28.080 --> 05:32.880] size information for each of these nodes. It also constructs an artifact with that information [05:32.880 --> 05:38.040] called the frame tree, which becomes useful later. And then we do painting and compositing [05:38.040 --> 05:43.120] and rendering, which is the visual part of rendering. [05:43.120 --> 05:48.240] But this process is all extremely visual, right? And what if you do not navigate the [05:48.240 --> 05:52.960] web visually? What if you navigate it with technology like a screen reader, which turns [05:52.960 --> 05:58.400] visual content into audio? What do you do then? And how does a screen reader figure out what [05:58.400 --> 06:03.000] it should be telling you? Well, that's the job of the accessibility [06:03.000 --> 06:07.680] engine. So like we have a rendering engine, we also have an accessibility engine in Firefox. [06:07.680 --> 06:12.000] It doesn't have a fun name. So if you can come up with a fun name, you should let me [06:12.000 --> 06:17.640] know on Matrix. But what it does is it takes in those artifacts we talked about before, [06:17.640 --> 06:22.360] the DOM, the frame tree, style structs, et cetera, and it marshals them into a new kind [06:22.360 --> 06:26.600] of tree, which we call the accessibility tree, or I like to call it the accessibility tree [06:26.600 --> 06:32.200] because that's more fun. But it takes all of those and computes excessively relevant [06:32.200 --> 06:38.560] information. So this is stuff like semantic role, name, the kinds of actions you can perform [06:38.560 --> 06:43.520] on an element, things like that. This is not necessarily one-to-one, like there is not [06:43.520 --> 06:48.120] a single accessible for every node in the DOM tree or a single accessible for every frame [06:48.120 --> 06:52.600] in the frame tree. We care about different things, which is why we have to build a new [06:52.600 --> 06:57.680] structure. And building the structure happens in the content process. We have one accessibility [06:57.680 --> 07:03.520] tree per web page. So let's take a look at how these queries [07:03.520 --> 07:09.040] happen from an assistive technology standpoint. So at the bottom here, I've got a couple different [07:09.040 --> 07:13.240] kinds of assistive technologies. These are ones that Benjamin mentioned on that graph [07:13.240 --> 07:17.640] from before. So we have screen readers, voice control, window managers, et cetera. These [07:17.640 --> 07:25.480] clients or ATs make requests to Firefox for web content information. So if you are navigating [07:25.480 --> 07:30.000] with a screen reader, the screen reader needs to ask what node is focused and what should [07:30.000 --> 07:35.400] I say about it to the end user. The way that those requests happen are through platform [07:35.400 --> 07:40.880] specific APIs, but they all hit the parent process in Firefox. The assistive technologies [07:40.880 --> 07:47.640] are separate applications. So they're communicating with Firefox through the parent process. Each [07:47.640 --> 07:54.140] web page lives in one or more other processes, one or more content processes, and is not [07:54.140 --> 08:00.200] reachable by the assistive technology directly. So we can't inject the screen reader into [08:00.200 --> 08:04.760] web content for a lot of reasons, security being one of them. All these calls go through [08:04.760 --> 08:10.040] the parent process. And there are some problems with this architecture [08:10.040 --> 08:17.080] that motivate what we're going to talk about next. So let's get into it. Like I said, computation [08:17.080 --> 08:22.320] of the relevant properties that the assistive technologies are requesting, that all happens [08:22.320 --> 08:28.280] using the accessibility tree in the content process. The result gets sent to the parent [08:28.280 --> 08:34.600] process from content via IPC, inter-process communication. This is slow and it's also [08:34.600 --> 08:40.840] synchronous. So if a call gets blocked or is taking a really long time in content, you [08:40.840 --> 08:45.240] can't do anything. The parent process just hangs. And because the parent process includes [08:45.240 --> 08:51.040] all of the browser UI as well, it just looks like Firefox is not responding, which isn't [08:51.040 --> 08:55.880] great. So what can we do about that? Well, our solution [08:55.880 --> 09:00.880] is this project we call Cache the World, which introduces a cache in the parent process that [09:00.880 --> 09:06.880] keeps track of snippets of content information that we need to compute and respond to those [09:06.880 --> 09:13.360] API calls. So we're trying to offload as much work as we can from content into parent. And [09:13.360 --> 09:17.720] this cache gets updated asynchronously based on content mutations. So we no longer have [09:17.720 --> 09:25.520] this problem of synchronous blocking IPC. Cool. So now I'm back and I'm going to talk [09:25.520 --> 09:31.360] a little bit about, like, how do we see if this stuff is working? So the first thing [09:31.360 --> 09:36.640] we did is actually not at all metric or measurement based, but it was more about helping debug [09:36.640 --> 09:43.200] in the profiler. So one of my great colleagues, Michael Kamala, added some accessibility markers [09:43.200 --> 09:48.320] in the profiler to kind of, like, get us an idea of, like, what's going on, where? You [09:48.320 --> 09:53.360] can see the specific calls here. And then I'm going to show you what it looks like kind [09:53.360 --> 09:59.960] of in the profiler. So the red circle is where we start to drop into some of the accessibility [09:59.960 --> 10:07.600] calls. So watch this space because we're going to be adding more markers here. The second [10:07.600 --> 10:12.040] thing we had to do is really come up with, like, how do we test accessibility and what's [10:12.040 --> 10:18.360] going on here? There's a huge amount of screen reader. There's just, like, a whole bunch [10:18.360 --> 10:23.520] of different screen readers, and they're all different, and each OS has a different strategy [10:23.520 --> 10:29.480] for dealing with this. So we have, like, a huge complex testing matrix here. In addition, [10:29.480 --> 10:34.320] we had to, like, in terms of testing, we had to, like, run a large number of variations [10:34.320 --> 10:40.280] to kind of verify our results. We have five different variations starting with the baseline, [10:40.280 --> 10:46.600] and then we kind of, like, have caches on and off with the accessibility implicitly [10:46.600 --> 10:52.160] on by just plugging in screen reader, and also with accessibility forced on with preferences. [10:52.160 --> 10:57.000] So we have a really large matrix of five on our task here, and then we were looking for [10:57.000 --> 11:04.680] specific problematic web content that would really trigger kind of the worst case scenarios [11:04.680 --> 11:11.960] here. And they are, in general, the worst case web content for this are really large [11:11.960 --> 11:17.920] static web pages. So what do we do? We added three specific sites. Actually, I think we [11:17.920 --> 11:22.680] have, like, five sites. But in general, it's like Wikipedia World War II is a great test [11:22.680 --> 11:30.840] page for testing accessibility. We have some search box links because we're Firefox engineers, [11:30.840 --> 11:35.880] and then what WG HTML specs. So these kind of, like, really large static pages, which [11:35.880 --> 11:40.720] is not necessarily how a lot of the web is built right now. But these are, like, specific [11:40.720 --> 11:47.800] problem points that we wanted to be aware of and address. And then comes the question [11:47.800 --> 11:53.680] of, like, well, what are we measuring? What's important? And we have, like, three general [11:53.680 --> 12:02.000] choices here. We have, like, W3C, navigation timing, kind of page load metrics, like OOG [12:02.000 --> 12:08.520] performance metrics, that segment browser page load into distinct phases, DNS redirects, [12:08.520 --> 12:15.400] DOM parsing, and then, like, content-ready pages loaded. We usually traditionally use [12:15.400 --> 12:21.240] visual metrics, but because of the nature of this, nope, can't do that. And then we [12:21.240 --> 12:25.360] have some kind of internal benchmarks that are not really publicly accessible where we [12:25.360 --> 12:30.680] just try to look at specific code flows and time and measure. And, like, that's really [12:30.680 --> 12:38.880] showing the most promise, frankly, and what we're going to be using more of in the future. [12:38.880 --> 12:44.560] And so what we have, we're trying to work in public, and we have some public dashboards [12:44.560 --> 12:55.800] for this work, which are at the end here. Whoops. Sorry. So this is, like, some preliminary [12:55.800 --> 13:01.840] results. This is a graph a little hard to understand, and I'm sorry about that. We have the blue [13:01.840 --> 13:11.040] baseline performance. We have these dotted lines with the caches turned off. And then [13:11.040 --> 13:17.920] we have what the caches turned on. And so we're seeing, like, yeah, not great performance [13:17.920 --> 13:24.800] for these static web pages right now, at least on Linux. I think that actually varies on [13:24.800 --> 13:33.240] Windows. But we're seeing some wins and some more even performance on things like IMD web [13:33.240 --> 13:40.880] pages, which aren't, like, these pathological test cases. So in general, what we're going [13:40.880 --> 13:46.360] to be doing is we're going to be trying to align the profile markers that were put in [13:46.360 --> 13:51.880] to performance metrics using our internal tools at first. And we're just going to try [13:51.880 --> 13:59.240] to start measuring, like, the actual cache creation time. And we also want to start paying [13:59.240 --> 14:05.160] attention to not just straight, classic page load, but we want to start thinking about [14:05.160 --> 14:14.880] page reload, tab switching. And one of the other leads on this project, JNET, has a great [14:14.880 --> 14:20.440] blog post about those kind of, like, anecdotal performance measurements. We definitely want [14:20.440 --> 14:25.960] accessibility first metrics. And we don't, we would like to get away from generic page [14:25.960 --> 14:35.200] load, tab metrics on this. We have a public dashboard, work in progress. It will continue [14:35.200 --> 14:41.680] to evolve as this work evolves. And then really quickly, future work. [14:41.680 --> 14:46.640] Yeah, so the accessibility team at Mozilla is responsible for a lot more than just the [14:46.640 --> 14:53.360] accessibility engine. We're also responsible for high contrast mode, zoom, Firefox front [14:53.360 --> 14:58.320] end usability and accessibility. So we've got a lot of projects apart from this that [14:58.320 --> 15:02.400] we're working on. But our main goal for this half is to shift cache to release. We're currently [15:02.400 --> 15:08.200] in beta and we have a lot of promising results. So we're really optimistic about getting this [15:08.200 --> 15:13.040] to all of our users. We're also planning on working on optimizations based on the performance [15:13.040 --> 15:17.080] work that you're seeing here. We have a couple of optimizations in mind. Like, we know we [15:17.080 --> 15:22.720] can improve on cache granularity. But this work will inform the kind of work that we're [15:22.720 --> 15:31.240] doing next. And then the performance team is going to really try to get these Windows [15:31.240 --> 15:36.600] results in since we know it's so important. At the same time, we want to make sure that [15:36.600 --> 15:43.320] Linux performance doesn't degrade. Also, we would like to kind of like put this into standard [15:43.320 --> 15:48.600] continuous integration test infrastructure. Kind of tune our markers, make sure we're [15:48.600 --> 15:55.320] measuring what we think we're measuring. And then things that we deem successful in a wide [15:55.320 --> 16:01.320] variety of web content, we want to try to push out to public telemetry so that we can [16:01.320 --> 16:10.280] actually measure much larger environments and users. And then, of course, all of the [16:10.280 --> 16:15.560] internal collaborations inside of Bazilla with Perftools and ETL and DevOps to try and [16:15.560 --> 16:23.120] make all the magic happen. We have some questions. If we have time for questions, we have time [16:23.120 --> 16:29.080] for questions. We have time for questions. And if you have other thoughts, you can email [16:29.080 --> 16:55.000] us or, you know, Twitter. Are there any questions? All right. So complete. Yeah. [16:55.000 --> 17:09.640] We actually, on the slide deck, but not in our presentation, we did have some additional [17:09.640 --> 17:14.240] resources and notes for people who are trying to work with accessibility, maybe new to it, [17:14.240 --> 17:22.840] and things that, here are some resources for you to use. Again, Jamie's blog post, really [17:22.840 --> 17:29.480] I'm going to really hype that again. Please read it. Morgan is going to put a video up [17:29.480 --> 17:36.120] that has to be done because there is some internal stuff that can't be shown. But she [17:36.120 --> 17:42.040] has a great walk-through about how to debug CSS for accessibility. And then I have a [17:42.040 --> 17:49.880] web page on color and contrast for accessibility and how you can compute colors that work for [17:49.880 --> 17:56.480] a wide variety of people. And also I want to shamelessly plug that you can contribute [17:56.480 --> 18:04.840] to Firefox. And if you are interested in working on platform-specific bugs or front-end bugs [18:04.840 --> 18:09.360] or whatever, accessibility is a great place to get involved because we span a lot of components [18:09.360 --> 18:12.760] and we could always use your help. So if you are interested, we have an accessibility [18:12.760 --> 18:20.120] room on matrix at the Mizzilla domain and you should reach out and we are there. So. [18:20.120 --> 18:27.760] We will take a question. You mentioned it is not safe to embed the screen [18:27.760 --> 18:32.720] redirecting to the web page because of security concerns. But now we are cashing, you are [18:32.720 --> 18:38.280] providing a little bit more information to this pattern process. Are there any security [18:38.280 --> 18:42.840] considerations you have to look at or address doing this work? [18:42.840 --> 18:46.600] We are paying attention to the kind of information that we are cashing. We don't want to give [18:46.600 --> 18:50.920] any private user information away. Largely, the information we are cashing is already [18:50.920 --> 18:56.040] represented in the parent process in some form. But the way that we compute things is [18:56.040 --> 19:02.240] different than how DOM or layout or other parts of the browser compute them. We are cashing [19:02.240 --> 19:10.280] really, really granular information as well. So, yeah, we are not currently concerned about [19:10.280 --> 19:12.120] security risk but that is a consideration. [19:12.120 --> 19:24.120] Maybe you already said, do you have performance tests with accessibility enabled right now? [19:24.120 --> 19:29.280] Yeah, that's what that website is. Oh, sorry. The question was do we have [19:29.280 --> 19:34.120] performance testing for accessibility? Yes, we are starting to do that. [19:34.120 --> 19:40.480] Is it just a matter of enabling accessibility and running exactly the same tests or are [19:40.480 --> 19:43.400] you doing something different for accessibility? [19:43.400 --> 19:51.960] Yeah, so the question is, what is the method there? You can contact me offline if we are [19:51.960 --> 19:56.920] running close. But we are using a standard framework for performance testing called browser [19:56.920 --> 20:04.120] time, which is open source. And, yes, what we are doing is we have OS specific handlers [20:04.120 --> 20:08.880] that basically start screen readers before we start running that and then stop at when [20:08.880 --> 20:14.960] we are done. So it is just RAI straight style on that, yeah. And then porting that to Windows [20:14.960 --> 20:15.960] too. [20:15.960 --> 20:20.400] One of the difficulties with that approach that we are running into is that we are most [20:20.400 --> 20:25.320] interested in perceived performance. So we want to know how does the user feel about [20:25.320 --> 20:29.880] this? Like, is it perceivably faster? And that is really hard to do because screen readers [20:29.880 --> 20:37.480] are difficult to automate from that perspective. Speech rate is extremely variable. You can [20:37.480 --> 20:41.640] do key presses and stuff, but it is really hard to get the kinds of measurements we want. [20:41.640 --> 20:45.280] So we are aware that the performance testing we are doing right now is a number and it [20:45.280 --> 20:49.040] is something that we can track consistently, but it isn't entirely what we would like to [20:49.040 --> 20:50.040] be. [20:50.040 --> 20:54.000] And there are different strategies on the Windows screen readers about having to have the full [20:54.000 --> 21:01.400] page ready before we actually start in with the speech. And that is like configurable [21:01.400 --> 21:06.640] and that is not the default setting for on Linux, for instance. So Orca, I think, is [21:06.640 --> 21:11.480] actually pretty smart about this. And they can do partial reads and start the speech [21:11.480 --> 21:14.480] earlier. So we are not getting quite. [21:14.480 --> 21:16.480] We have a comment on that. [21:16.480 --> 21:17.480] Oh, sure. [21:17.480 --> 21:18.480] There is a question. [21:18.480 --> 21:24.760] Oh, here it is. Note that the caching of the parent moves information into a process that [21:24.760 --> 21:26.760] is not exposed to web content. [21:26.760 --> 21:31.760] There is nothing before that. [21:31.760 --> 21:38.760] It is not appearing here. Maybe. Yes, here. [21:38.760 --> 21:43.000] Oh, can you talk about how the cache is populated and invalidated? [21:43.000 --> 21:48.000] Oh, sure. How much time do we have? [21:48.000 --> 21:49.000] Two minutes. [21:49.000 --> 21:50.000] Okay. [21:50.000 --> 21:51.000] Go. [21:51.000 --> 21:52.000] Go. [21:52.000 --> 21:58.640] So the cache is populated from content. So it is a push-based cache. We aren't invalidating [21:58.640 --> 22:03.560] from parent because we can't observe content mutations from parent effectively. Each content [22:03.560 --> 22:08.080] process is responsible for monitoring their own mutations and pushing or invalidating stuff [22:08.080 --> 22:10.640] in the parent process as needed. [22:10.640 --> 22:19.800] We have an initial cache push that... Oh, no, sorry. On page load, we collect a bunch [22:19.800 --> 22:24.240] of information and push it always so there isn't any sort of mutation that we're responding [22:24.240 --> 22:28.800] to there. That is one of our big performance concerns is the initial cache push varies [22:28.800 --> 22:33.120] by page size or scales by page size, and that's really costly. [22:33.120 --> 22:34.120] But... [22:34.120 --> 22:36.120] That's why you put all those big tests in there. [22:36.120 --> 22:37.120] Yes. [22:37.120 --> 22:44.120] So from initial cache push onward, we're responding to mutations in content from content. [22:44.120 --> 22:45.120] Yeah. [22:45.120 --> 22:47.120] Are there any other questions? [22:47.120 --> 22:48.120] Oh, yeah. [22:48.120 --> 22:56.320] Go into the limit. [22:56.320 --> 23:05.480] On the web app side, what may impact negatively the performance of the accessibility? [23:05.480 --> 23:11.400] Like how could you design web content such that it's optimal for accessibility? [23:11.400 --> 23:18.800] That's a great question, and we'll come back at you later with an answer. [23:18.800 --> 23:23.120] Yeah, we're still kind of early in phase on this, but I feel it would be a great idea [23:23.120 --> 23:29.800] to do some kind of web content help to get people to know the performance choices they're [23:29.800 --> 23:30.800] making for accessibility. [23:30.800 --> 23:31.800] Yeah. [23:31.800 --> 23:40.960] Oh, yeah. Could we come up with some guidelines for performance learning and general guidelines [23:40.960 --> 23:47.960] for how to do performance accessibility? Request submitted. Thank you. [23:47.960 --> 23:56.960] So thank you very much. We are done. [23:56.960 --> 24:03.960] Thank you.