[00:00.000 --> 00:16.120] So we are at our penultimate talk for this dev room, and we have Mathias Horvat, which [00:16.120 --> 00:21.160] I hope I pronounced it right, with the staff software engineer on the Mozilla localization [00:21.160 --> 00:26.360] team, and he's going to talk about how you can localize your open-source project with [00:26.360 --> 00:27.360] pontoon. [00:28.120 --> 00:29.120] Thank you. [00:29.120 --> 00:31.120] Hello, everyone. [00:31.120 --> 00:36.880] First of all, I would like to thank you all for coming all the way to Brussels to listen [00:36.880 --> 00:37.880] to me. [00:37.880 --> 00:40.280] I really, really appreciate it. [00:40.280 --> 00:46.520] I hope you're having a good day today, and you're going to be having a good day tomorrow. [00:46.520 --> 00:52.080] As Francesca said, I'm an engineer with Mozilla for some time now, and today I wanted to talk [00:52.080 --> 01:00.800] to you about localization, specifically how we do localization at Mozilla, and hopefully [01:00.800 --> 01:07.840] how it can benefit you as well, be it within Mozilla or some Mozilla-related project or [01:07.840 --> 01:10.200] not. [01:10.200 --> 01:15.080] But first things first, I have to mention something very important. [01:15.080 --> 01:16.080] This is Inti. [01:16.080 --> 01:24.480] She's my oldest daughter, and she just turned seven today, and her dad is at some conference [01:24.480 --> 01:29.920] with Geeks, spending time away from her. [01:29.920 --> 01:34.440] But can somebody take a picture of... [01:34.440 --> 01:38.200] Thank you for that. [01:38.200 --> 01:41.920] Actually brought her to Brussels, so she's here. [01:41.920 --> 01:47.200] So we spent like an hour today together, and by the time I get back home, she's going [01:47.200 --> 01:48.200] to go to bed. [01:48.200 --> 01:50.440] No, I'm kidding. [01:50.440 --> 01:55.440] But I actually wanted to make this talk shorter, because I want to spend more time with her, [01:55.440 --> 01:56.440] so sorry about that. [01:56.440 --> 01:58.440] It's going to be a pretty short talk. [01:58.440 --> 02:02.400] And then if you're going to have any questions, Emily, you'll answer them, okay? [02:02.400 --> 02:03.400] Is that fine? [02:03.400 --> 02:04.400] Thank you. [02:04.400 --> 02:07.680] Emily's my colleague over there who's going to do the last talk today. [02:07.680 --> 02:14.600] So big round of applause for Emily for being the last to speak today. [02:14.600 --> 02:16.600] She really appreciates that. [02:16.600 --> 02:25.640] Okay, back to localization and to some serious business. [02:25.640 --> 02:26.760] This is actual data. [02:26.760 --> 02:31.440] There's just 13% of Firefox users that are based in the U.S. [02:31.440 --> 02:33.600] That's maybe not very surprising. [02:33.600 --> 02:42.280] What could be a little bit more surprising is that 60% of all Firefox users use non-default [02:42.280 --> 02:47.920] locale, which is ENUS, American English. [02:47.920 --> 02:51.880] In case it's not obvious, what I'm trying to say is that localization matters. [02:51.880 --> 02:54.200] It's actually very important. [02:54.200 --> 03:01.800] We all, me included, often think of localization as an obstacle or something that we're going [03:01.800 --> 03:05.280] to do later or we're going to do it one day. [03:05.280 --> 03:17.240] But it actually really matters because apparently it keeps the door shut if you don't do localization [03:17.240 --> 03:21.680] of your software. [03:21.680 --> 03:28.680] I want to say a few things first about how localization actually works at Mozilla. [03:28.680 --> 03:38.920] It's driven by hundreds if not thousands of contributors, volunteers, who spend their [03:38.920 --> 03:43.840] free time contributing to Mozilla because they like it or because they like the products [03:43.840 --> 03:50.840] that Mozilla develops or they like the mission or they care about their language. [03:50.840 --> 03:58.320] We're truly grateful that we have such an, as we call it, army of awesome people who [03:58.320 --> 04:05.360] are, as you saw earlier, basically responsible for 60% of the Firefox market share. [04:05.360 --> 04:06.560] There's not just Firefox. [04:06.560 --> 04:14.400] As you'll see later, there's many, many more projects that Mozilla localizes. [04:14.400 --> 04:20.720] The platform that we use for localization is called Pontoon. [04:20.720 --> 04:26.800] It's like a classic translation management system through which localizers interact. [04:27.800 --> 04:30.920] But it's basically, as I mentioned, just an interface. [04:30.920 --> 04:38.640] The actual strings, the actual English strings and translations are stored in repositories. [04:38.640 --> 04:43.520] So usually that's GitHub, I think also GitLab. [04:43.520 --> 04:47.440] Sometimes there's also hg.mozilla.org. [04:47.440 --> 04:49.720] That's what we call a single source of truth. [04:49.720 --> 04:54.760] And then Pontoon is basically just an interface because many of our localizers are surprised, [04:54.760 --> 05:00.440] not really developers, don't really want to work with repositories directly. [05:00.440 --> 05:07.520] So it's much easier for them to make contributions through a tool that is hopefully not much [05:07.520 --> 05:13.240] more complicated to use than, say, email client or Facebook. [05:13.240 --> 05:19.440] As you can see from this page, this is a profile page of one of our active localizers. [05:19.440 --> 05:26.200] We really like version control systems, in particular GitHub, as you can say, by a particular [05:26.200 --> 05:29.800] widget on this page. [05:29.800 --> 05:37.360] And the way things work is that localizer would log in, they start by picking their team, [05:37.360 --> 05:44.760] their locale, like the localizer's software to French. [05:44.760 --> 05:50.320] And they start on the French page, in this case, which has some basic stats, some basic [05:50.320 --> 05:52.640] information about the locale in general. [05:52.640 --> 05:59.120] And more importantly, at least all the projects that this community localizes. [05:59.120 --> 06:01.800] This is a screenshot, so I can't really scroll. [06:01.800 --> 06:07.560] There's 35 projects in total that the French community localizes. [06:07.560 --> 06:16.600] I think in total we have 36, and they are being translated to over 200 different locales. [06:16.600 --> 06:20.400] For those of you who are not familiar, the difference between a language and a locale [06:20.400 --> 06:25.840] is that Spanish is one language, but then you have several variants of Spanish, for [06:25.840 --> 06:34.240] example, like Spanish Spanish or Argentine Spanish or Mexican Spanish, those are locales. [06:34.360 --> 06:37.960] All specific variants. [06:37.960 --> 06:43.880] So localizer would go to this page, pick one project, for example, AMO front-end, which [06:43.880 --> 06:48.000] is not fully translated yet. [06:48.000 --> 06:54.560] And then the translate view opens up, which is again a pretty straightforward page. [06:54.560 --> 06:59.680] On the left you see the list of strings, and in the middle you have on top, source string, [06:59.680 --> 07:06.360] and then the text field into which you enter translations. [07:06.360 --> 07:14.400] And then in the bottom right corner you see two tabs from which translators get some inspiration [07:14.400 --> 07:15.560] from. [07:15.560 --> 07:21.920] You get suggestions from several machine translation engines, translation memory, and you can also [07:21.920 --> 07:27.560] look into how other locales might have translated the same string. [07:27.600 --> 07:32.960] There's two ways most of our teams operate in. [07:32.960 --> 07:39.280] One is some localizers submit translations directly, which means as soon as they are [07:39.280 --> 07:46.640] submitted to Pontoon they end up in the version control system and can be used in product. [07:46.640 --> 07:52.800] The alternative and more common way is that localizers just submit suggestions, and those [07:52.840 --> 08:01.200] suggestions then need to be approved by our trusted localizers who have worked with localization [08:01.200 --> 08:06.520] for some time and have a proven track record of submitting quality translations, and then [08:06.520 --> 08:08.880] they get into the repository. [08:08.880 --> 08:14.360] So here in this case we're actually seeing on the left we're seeing strings with corresponding [08:14.360 --> 08:20.680] suggestions, which are then approved by a reviewer. [08:23.800 --> 08:26.520] Maybe one more detail around this. [08:26.520 --> 08:34.160] Since you see the source string and the translation also in the sidebar on the left, the status [08:34.160 --> 08:39.040] boxes on the left are actually check boxes, so you can select multiple strings and approve [08:39.040 --> 08:46.520] them at the same time or reject them all at once. [08:46.520 --> 08:53.920] One last thing before I start to stop with the presentation of Pontoon. [08:53.920 --> 09:02.000] We're currently working on pre-translation feature, which is essentially engaging machine [09:02.000 --> 09:08.760] translation and translation memory, and as soon as source strings get exposed in the [09:08.760 --> 09:14.280] repository to be translated, and as soon as they are served to localizers and localizers [09:14.320 --> 09:22.200] get notifications, hey, new strings are available, these strings get pre-translated using a combination [09:22.200 --> 09:26.320] of translation memory and machine translation. [09:26.320 --> 09:29.760] So if we find a perfect match, we would use a translation memory. [09:29.760 --> 09:37.320] If we don't find anything usable in translation memory, we fall back to machine translation. [09:37.360 --> 09:48.360] This is a pretty controversial topic, because pre-translation can yield interesting results. [09:48.360 --> 09:52.200] Thank you. [09:52.200 --> 10:00.160] That means that we're really slowly rolling this out for particular project-local combinations, [10:00.160 --> 10:04.240] where there's actual needs, where, for example, locales are a little bit falling behind, but [10:04.240 --> 10:10.840] at the same time, they have reviewers who are active enough to hop in and correct potential [10:10.840 --> 10:15.440] errors that the pre-translation produces. [10:15.440 --> 10:20.080] Pontoon is open source, it's freely available, so there's actually other users of Pontoon [10:20.080 --> 10:23.200] outside Mozilla. [10:23.200 --> 10:31.760] We're not aware of many, maybe a dozen, but we also don't know in case there are more. [10:31.760 --> 10:35.440] It's relatively easy to set it up. [10:35.440 --> 10:41.960] We sadly don't offer any official support, but if you do come to our discourse, I'm [10:41.960 --> 10:51.840] going to show the links at the last slide, or to our chat, chat.mozilla.org. [10:51.840 --> 11:00.960] We try to help, but like I said, we don't offer any official support. [11:00.960 --> 11:07.920] There are some requirements that need to be met in order for a project to be localized [11:07.920 --> 11:08.920] with Pontoon. [11:08.920 --> 11:17.600] Obviously, you need to use GitHub or some other VCS backend as a storage for translations. [11:17.600 --> 11:22.800] Then you have two options for organizing the files, either you follow a predefined folder [11:22.800 --> 11:30.480] structure or you use our Altenand.toml specification, which is then read by Pontoon to detect [11:31.200 --> 11:40.800] where the source files are and where the translations are submitted. [11:40.800 --> 11:43.760] Obviously, you need to use one of the, you need to store your translations in one of [11:43.760 --> 11:45.600] the supported file formats. [11:45.600 --> 11:49.360] Here's some of them. [11:49.360 --> 11:51.080] You might be familiar with Fluent. [11:51.080 --> 11:57.200] This is one of the formats that Mozilla developed. [11:57.200 --> 12:05.520] It's now basically slowly being, Emil is going to talk about it in the next talk, is [12:05.520 --> 12:11.280] basically transitioning slowly towards message format two, which is the format that is being [12:11.280 --> 12:12.280] developed. [12:12.280 --> 12:13.640] That's why there's an asterisk at the end. [12:13.640 --> 12:19.600] We don't technically have a full-blown support for it yet, but we're working on that. [12:19.600 --> 12:27.160] There's also most common file formats are supported by Pontoon. [12:27.160 --> 12:33.040] And once your project meets those requirements, you just need to create it on your Pontoon [12:33.040 --> 12:37.280] instance, which is typically a very simple step. [12:37.280 --> 12:43.880] You need to add a project name, select target locales, and add a link to your repository, [12:43.880 --> 12:45.160] and that's basically it. [12:45.160 --> 12:48.240] You save it, you sync it, and you have strings ready. [12:48.240 --> 12:53.640] Now the tricky part here is that you need your own instance, and that's a little bit [12:53.640 --> 12:57.760] more work than filling out this form. [12:57.760 --> 13:04.680] Like I said, there is documentation on how to do that in our repository. [13:04.680 --> 13:10.240] It is, however, in our minds for some time now. [13:10.240 --> 13:17.280] We're testing waters whether there's an interest for us to create something like a multi-tenant [13:17.280 --> 13:22.520] Pontoon instance where you wouldn't need to maintain your own instance. [13:22.520 --> 13:30.240] You would just come and create your own project there and use that instance. [13:30.240 --> 13:35.480] Yeah, that's pretty much it. [13:35.480 --> 13:38.600] I would like to end here. [13:38.600 --> 13:43.680] This is the link to the repository, obviously, and all the links to this course and to chat [13:43.680 --> 13:47.680] that I mentioned and the documentation are there. [13:47.680 --> 13:54.280] You can also find me on Matrix or Twitter, sorry, Matt Jazz, or you can send me an email, [13:54.280 --> 13:57.400] and I'd be also happy to answer any questions here. [13:57.400 --> 13:59.400] Thank you. [13:59.400 --> 14:06.440] Thank you very much. [14:06.440 --> 14:13.040] So we already have two questions in the Matrix room. [14:14.000 --> 14:19.600] Does it support more complex translation like full articles, example given, what we [14:19.600 --> 14:23.240] can find on support.modzilla.org? [14:23.240 --> 14:24.960] Short answer, no. [14:24.960 --> 14:31.000] Pontoon is designed to be software localization translation system, and we currently don't [14:31.000 --> 14:40.400] have any support for, yeah, I don't know how to call it, articles, longer blocks of text. [14:40.400 --> 14:50.520] We sometimes abuse that, basically, and split some of the articles or some of our web pages [14:50.520 --> 14:55.200] by paragraphs into multiple strings, but that's not really it. [14:55.200 --> 15:02.440] That's not really the same as Wikipedia localization works or how MDN localization used to work [15:02.440 --> 15:05.800] in the past. [15:05.800 --> 15:11.600] We have a ticket on file for that probably since the first week, since Pontoon repository [15:11.600 --> 15:17.880] was created, but there has been basically no work on that. [15:17.880 --> 15:26.000] We do, not only do we try to help you if you want to set up your instance, we're very happy [15:26.000 --> 15:27.720] to take patches. [15:27.720 --> 15:30.480] This one would be obviously huge. [15:30.480 --> 15:38.280] But anything that doesn't interfere with Mozilla needs, we would be definitely happy [15:38.280 --> 15:40.200] to support. [15:40.200 --> 15:44.400] The reason why we haven't implemented that feature is because at Mozilla there simply [15:44.400 --> 15:50.440] was no real need for that, apart from the exceptions that I mentioned earlier. [15:50.440 --> 15:53.440] I hope that answers the question. [15:53.440 --> 15:56.320] We have another question from Sylvia. [15:56.320 --> 16:02.640] I wonder, why does Pontoon exist when other of us translation projects like WebBlade exist? [16:02.640 --> 16:06.200] What WebBlade not yet around when the project started? [16:06.200 --> 16:11.880] Were there any specific feature design decision you were missing that didn't work with WebBlade? [16:11.880 --> 16:18.720] Not to say that Pontoon shouldn't exist, I'm just wondering what its unique selling feature. [16:18.720 --> 16:20.520] That's a great question. [16:20.520 --> 16:25.480] I think it's good that people have options when they go to the store and they can choose [16:25.520 --> 16:29.080] different types of milk or different types of cars. [16:29.080 --> 16:35.200] So it's sort of like the same question as why does BMW exist if there's Mercedes? [16:35.200 --> 16:40.840] I think Pontoon, I don't know WebBlade too well, I have to admit that. [16:40.840 --> 16:49.160] I was at the presentation today and from what I heard I think it's an amazing piece of software. [16:49.160 --> 16:55.800] I know that, for example, Mozilla is very eager about supporting natural selling translations [16:55.800 --> 16:58.440] through Fluent and Message Format. [16:58.440 --> 17:01.280] We have special UI for that. [17:01.280 --> 17:06.240] Maybe that also exists in WebBlade, I don't know, but I would guess that no, because Fluent [17:06.240 --> 17:12.120] never really passed the borders of Mozilla very intensively. [17:12.120 --> 17:17.400] So that would be one of the things that, and the Message Format support which is related [17:17.440 --> 17:20.800] to that would be one of the things that comes to my mind. [17:20.800 --> 17:26.440] But other than that, I think it's mostly, there's probably a bunch of other tools. [17:26.440 --> 17:29.600] I don't know if Puddle is still in development. [17:29.600 --> 17:33.360] There's also close source systems. [17:33.360 --> 17:38.200] I don't think, I think it's good that people have different choices and somebody likes [17:38.200 --> 17:41.640] that type of UI, somebody likes other types of UI. [17:41.640 --> 17:58.520] So, can we add support for Firefox translations in addition to Google and Sistran? [17:58.520 --> 18:00.320] Is it easy to do? [18:00.320 --> 18:02.480] It's very easy to do. [18:02.480 --> 18:11.600] Actually we've been, when we started working on pre-translation support, we wanted to [18:12.560 --> 18:20.440] only use machine translation engines that could be customized and trained with our own data. [18:20.440 --> 18:28.240] And when we were evaluating several engines, obviously Firefox translations was the first [18:28.240 --> 18:30.040] on the list. [18:30.040 --> 18:36.160] The challenge at that point, and that was maybe half a year ago, things might have changed, [18:36.160 --> 18:44.240] was that the quality was a little bit lower, at least from our experience. [18:44.240 --> 18:48.960] We were using, I think, BlueScore system, and I think BlueScore was about five to ten [18:48.960 --> 18:53.840] percent lower for the locales that were supported by Firefox translations. [18:53.840 --> 18:58.920] And it's killing us because we would like to support Firefox translations, and I'm sure [18:58.920 --> 19:02.520] that one day we will. [19:03.440 --> 19:07.440] The other issue was that, at least at that point, there was maybe a dozen of locales [19:07.440 --> 19:12.600] that Firefox translations support, whereas with Altima, it's around 50, and then there's [19:12.600 --> 19:18.800] 50 additional supported by the generic engine of Google. [19:18.800 --> 19:23.880] So yeah, hopefully we're going to extend support to Firefox translations soon. [19:23.880 --> 19:30.640] And it's actually a good point, since adding an engine itself is quite trivial, which we [19:30.640 --> 19:34.960] should probably just add it, not to pre-translation, but at least to that machinery tab where you [19:34.960 --> 19:36.760] could get suggestions from. [19:36.760 --> 19:39.680] Shit, why haven't we done that? [19:39.680 --> 19:40.680] Thank you. [19:40.680 --> 19:48.080] We do collect that, yes. [19:48.080 --> 19:50.320] Oh, sorry, sorry. [19:50.320 --> 19:58.720] So the suggestion was that it would be nice to also collect telemetry to see which engine [19:58.720 --> 20:01.560] is preferred by users. [20:01.560 --> 20:07.920] We actually do that already for each translation that's submitted by just copying it over from [20:07.920 --> 20:11.480] translation memory or any of the machine translation engine. [20:11.480 --> 20:17.920] We keep track of that, and we can see that, okay, this engine is more likely to be used [20:17.920 --> 20:19.920] than the other. [20:19.920 --> 20:39.760] So one thing I was wondering regarding, like, Fluent, for example, like other libraries, [20:39.760 --> 20:46.400] for example, the translate toolkit does not have support for Fluent yet, and I was wondering [20:46.400 --> 20:58.000] if Mozilla was planning to help on the development of Fluent support in the translate toolkit. [20:58.000 --> 21:05.000] And another and related thing is that if there are any way of doing, like, validations, verifications, [21:05.000 --> 21:13.040] because in our project we have a lot of very beautiful translators, but they are, many [21:13.040 --> 21:16.920] times, it's the first time they translate, so, like, they make a lot of mistakes with [21:16.920 --> 21:22.760] the HTML, markdown syntax, and if you have any kind of validation. [21:22.760 --> 21:24.480] Okay, thank you. [21:24.480 --> 21:30.680] So maybe I can split my answer into two pieces, one piece around Fluent support in translate [21:30.680 --> 21:35.320] toolkits or maybe some other libraries, and the other question is about whether Pontoon [21:35.320 --> 21:38.440] has any sort of quality checks. [21:38.440 --> 21:41.040] So the first question. [21:41.040 --> 21:46.600] I think Emily will have much better answer to that in the next talk, which is going to [21:46.600 --> 21:53.080] be about message format 2.0 standard, which I see, maybe I don't see clearly, Emily is [21:53.080 --> 22:03.080] going to correct me, which I see as Fluent 2.0, it's developed under the standardization [22:03.080 --> 22:10.480] bodies, and that, I think, means that the wider support in multiple tools is going to [22:10.480 --> 22:11.480] come. [22:11.480 --> 22:17.600] If you're specifically interested about Fluent and adding Fluent support to translate toolkit, [22:17.600 --> 22:23.040] then I think we should definitely talk and see if there's an opportunity for that. [22:23.040 --> 22:26.680] It's already supported, so it's not going to be a question. [22:26.680 --> 22:28.680] Okay, apparently it's already supported. [22:28.680 --> 22:48.680] So, translate toolkit already supports Fluent. [22:48.680 --> 22:50.200] That's the answer to the first question. [22:50.200 --> 22:51.200] Thank you. [22:51.200 --> 22:56.680] The second question about quality checks, and that's actually related to translate toolkit, [22:56.680 --> 23:02.600] Fluent uses three different libraries for quality checks. [23:02.600 --> 23:09.360] One is actually two are internal Mozilla libraries, and another one is translate toolkit library, [23:09.360 --> 23:11.240] which also has its own checks. [23:11.240 --> 23:17.600] So yes, if there are any obvious errors that can be automatically detected, we will most [23:17.600 --> 23:18.720] likely detect it. [23:18.720 --> 23:23.680] There's probably errors that we could detect, but we don't, but I think most of them, most [23:23.680 --> 23:25.600] of them we do. [23:25.600 --> 23:32.080] We work on improvements to our check system through developers telling us, oh, you broke [23:32.080 --> 23:33.080] our product. [23:33.080 --> 23:35.760] Okay, apparently our checks are not good enough. [23:35.760 --> 23:40.520] So over the years, I think our check system became quite bulletproof. [23:40.520 --> 23:41.520] Thank you. [23:41.520 --> 23:48.360] We have time for one last question, if someone has one. [23:48.360 --> 23:51.720] I don't see anyone. [23:51.720 --> 23:57.160] So, thank you very much, everyone, and thank you very much. [23:57.160 --> 24:01.000] There's a cake under the seat, just check it out. [24:01.000 --> 24:02.000] Okay. [24:02.000 --> 24:03.000] Thank you very much. [24:03.000 --> 24:04.000] Thank you. [24:04.000 --> 24:05.000] Thank you very much, everyone. [24:05.000 --> 24:06.000] Thank you very much. [24:06.000 --> 24:07.000] There's a cake under the seat. [24:07.000 --> 24:07.000] Just check it out.