[00:00.000 --> 00:13.880] Okay. So, good afternoon. I'm Nilo Menezes, and I would like to share some scripts, and [00:13.880 --> 00:25.600] after all these presentations we had this afternoon, quick hacks that I need to implement [00:25.600 --> 00:38.040] while translating a NIN document, how Python developers could use NIN on the wiki. Okay, [00:38.040 --> 00:42.880] so I will explain it better later on. So, translation documentations with cloud tools [00:42.880 --> 00:50.480] and scripts. So, how it started, we had a document called NIN for Python programmers [00:50.480 --> 00:57.680] that was written in English and Spanish by the same author. And this document is simply [00:57.680 --> 01:06.400] a translation table from Python to NIN. So, if you already know how to write Python programs, [01:06.400 --> 01:11.680] you can read this document and it will explain, okay, this is a structure you have to write [01:11.680 --> 01:19.360] that way and so on and so on. But it was written by the same developer in English and Spanish [01:19.360 --> 01:28.400] in a wiki format using the markdown, and it's the wiki of GitHub. So, in the Brazilian NIN [01:28.400 --> 01:35.240] group, when people started to say, hey, it would be useful to get more users to the NIN [01:35.240 --> 01:40.520] language, if we had this document, NIN for Python programmers, also translated in Brazilian [01:40.520 --> 01:46.960] Portuguese. And then I said, okay, I did some translations before, I think I can help on [01:46.960 --> 01:53.200] this. So, I started checking how I could do the contribution. So, I went to the GitHub [01:53.200 --> 02:02.560] wiki, I saw that the source code was in markdown format. I just checked it out on Git, and [02:02.560 --> 02:08.440] I started translating. Just after the table of contents, I said, this will not end very [02:08.440 --> 02:12.320] well. I will have lots of problems later on because this document will be changed, it [02:12.320 --> 02:18.840] will be updated, nobody will tell me that the original was updated. And in a document, [02:18.840 --> 02:25.440] you can also move sections and do different things. So, I said, this is not a good start. [02:25.440 --> 02:33.800] So, the idea was, how can I help translating but also building an initial infrastructure [02:33.800 --> 02:41.440] to help translating this document to other languages, okay? So, how they were doing, [02:41.440 --> 02:46.440] they just cloned the English page, and they started writing, overwriting the text in the [02:46.440 --> 02:52.560] native language, okay? If you are the same author, it's fine. So, he wrote the English [02:52.560 --> 02:58.800] version, he also translated to Spanish. Three or four languages, I think it's still workable, [02:58.800 --> 03:04.120] but if you start to have as many languages as we saw in very big projects, you know [03:04.120 --> 03:11.520] that it's impossible to keep up to date. So, I started to think about how can I update [03:11.520 --> 03:17.440] this? And I had the previous experience of working with translations in PO files, using [03:17.440 --> 03:24.560] PO files for Lin City, and also translations like December how to, and in very old document [03:24.560 --> 03:33.800] formats where we really have to translate copy and translate. But the PO format started [03:33.800 --> 03:39.960] to be very interesting, because I said, if I managed to convert this markdown to PO [03:39.960 --> 03:45.480] and create a process around it, then we'll be back to the standard translation process [03:45.480 --> 03:50.800] where we can use PO edit, for example, to do the translation and move on. So, I will [03:50.800 --> 03:59.760] report what I did and the tools I selected for doing this job. As I said, it looked very [03:59.760 --> 04:04.760] much like an old problem. So, how to translate software to another language? The PO and get [04:04.760 --> 04:10.960] text combinations is very, very good. It's easy to use. Even if it's the first time you [04:10.960 --> 04:18.280] are translating software to just tag the text you want to be translated using get text, [04:18.280 --> 04:25.040] it's relatively easy. But I wasn't working with source code, I was working with markdown. [04:25.040 --> 04:33.880] And the markdown, you also have some extra markers regarding the text formatting and very [04:33.880 --> 04:39.640] specific items that I didn't want to spend time creating a converter from markdown to [04:39.640 --> 04:48.280] the PO file format. Of course, I started searching, I found some tools. But not all tools support [04:48.280 --> 04:56.440] the same kind of markdown as the GitHub markdown. And also, they create PO files with different [04:56.440 --> 05:02.600] qualities. So, I spent some time tweaking. So, if you're never working with the PO file, [05:02.600 --> 05:07.760] it looks like this. You have a message ID that is usually the original string and you [05:07.760 --> 05:12.800] have the message string that is the translated one. And you create a new file for each language [05:12.800 --> 05:19.480] you are working on. So, usually English is the base language. And then you create a PO [05:19.480 --> 05:26.480] file for Portuguese, another for Spanish, and so on. Prerestanda, I think most of the [05:26.480 --> 05:34.680] afternoon we heard about PO. So, I think it was a good idea to keep this format. So, this [05:34.680 --> 05:41.760] was the process if we were translating standard source code. So, we have the source file, we [05:41.760 --> 05:51.040] extract the PO file. Using the PO file, we use a translation tool or an editor to do [05:51.040 --> 05:56.880] the translation manually, string by string. After that, we compile the ML file and the [05:56.880 --> 06:02.360] executables or the script can use the ML file and present the text translated to the end [06:02.360 --> 06:08.680] user. So, I just needed to adapt this for markdown. And there is also a very special [06:08.680 --> 06:17.640] point regarding how the wiki is kept on GitHub that I will explain later. So, how to convert [06:17.640 --> 06:27.000] markdown, this specific for GitHub markdown to the PO file. And also, I started to test [06:27.000 --> 06:32.280] multiple packages because as I said, if you Google, you find many converters from markdown [06:32.280 --> 06:39.720] to PO file, I think I tested two or three. I didn't rule down each one, but the final [06:39.720 --> 06:48.520] one is MD-PO. There is a library, a series of Python scripts, so it would be, it is much [06:48.520 --> 06:54.600] more easier for me to use packages in the same language because I could just write a [06:54.600 --> 07:02.240] PyProject and put all these libraries in the same PyProject. The previous one was in Java, [07:02.240 --> 07:06.480] JavaScript. And you know, if you want to run something in JavaScript, especially if you [07:06.480 --> 07:12.360] are not using Linux, you need to install a lot of other software. So, this would enable [07:12.360 --> 07:17.640] the translation from markdown to PO and vice versa because this is the hardest part. Once [07:17.640 --> 07:25.160] you transform the markdown file into a PO file, you do the translation, but the main [07:25.160 --> 07:31.240] objective is to get a translated markdown file back so you can use it in GitHub and [07:31.240 --> 07:36.800] present the page in your native language with all the formatting that the original auto [07:36.800 --> 07:47.160] did before. Okay. And I also started to think, okay, maybe I can write another script that [07:47.160 --> 07:56.680] you manipulate the PO file and help me do the initial translation. Why? We have a series [07:56.680 --> 08:05.040] of tools for automatic translation, but of course, this was something I just started. [08:05.040 --> 08:10.240] I didn't have the integrations or anything like this, but I also didn't want to mess [08:10.240 --> 08:17.560] with the PO file itself. So, I found a Python library called POlib that does exactly this. [08:17.560 --> 08:22.920] I can open a PO file. I can, for example, do some filtering, like, okay, give me just [08:22.920 --> 08:29.240] the strings that are not yet translated, and so it's very, very easy to build and manipulate [08:29.240 --> 08:37.240] the PO files with it. Okay. So, this is the example problem, program. You simply open [08:37.240 --> 08:43.040] the file, we start the translation, and with the help with AWS translator that I was using [08:43.040 --> 08:50.240] because most of the time I work with AWS, I could easily send string by string to the [08:50.240 --> 08:56.800] cloud and get an initial translation that I would just review later on. Okay. Because [08:56.800 --> 09:03.760] you can never trust the automatic translation, especially if you are working phrase by phrase, [09:03.760 --> 09:10.120] it's very easy to miss your target. But it's very good nowadays. So, I would say at least [09:10.120 --> 09:16.680] 80% of everything you do in the automatic translation, you can keep as it is, but you [09:16.680 --> 09:21.440] still have to fix the 20%, and most of the times the 20% is quite embarrassing. So, you [09:21.440 --> 09:27.320] really need to review and double check before you publish anything. And as it's a paid service, [09:27.320 --> 09:34.680] as many translation services, and as our colleague, I didn't know the delivery translator that [09:34.680 --> 09:40.680] was presented this afternoon, you don't want to pay every time you do this. So, I use the [09:40.680 --> 09:47.760] previous script to create a list of the strings that were never translated, they were still [09:47.760 --> 09:53.840] empty. So, I know that if I run it multiple times, I will not be bugging AWS translator [09:53.840 --> 10:01.000] and paying for the translation of strings that are already translated or reviewed. Okay. [10:01.000 --> 10:07.160] And as you can see, the script is very simple. You open the PO file, you send the text to [10:07.160 --> 10:13.520] the cloud, in this case, AWS translate. You save, you replace the string and you save [10:13.520 --> 10:19.880] it in the new PO file, it's done. Okay. So, the script is almost out of this. You have [10:19.880 --> 10:31.080] it complete on GitHub. And this is the main job. Another advantage of these tools is that [10:31.080 --> 10:37.800] you can create a list of words that should not be translated. And this helps a lot, especially [10:37.800 --> 10:43.920] if you have a product or a document with a common name, that is, that you don't want [10:43.920 --> 10:52.680] to be translated. So, you can pass these special lists, you can create some exceptions. But [10:52.680 --> 10:58.480] as was explaining in the previous presentation, it's also a problem because as we, most of [10:58.480 --> 11:04.320] the time, we select English as the main or the source translation language, you cannot [11:04.320 --> 11:16.680] create exclusion lists using programming languages, keywords. So, in this document, Python, name [11:16.680 --> 11:21.640] for Python programmers, of course, there are lots of source code and we're not translated [11:21.640 --> 11:28.640] to Portuguese, for example, keywords like for, in, we're all translated to Portuguese. [11:28.640 --> 11:36.120] So, by using the automatic translation engine, you have to review and revert this translation, [11:36.120 --> 11:42.760] so the translated program will continue to be valid. And you have to pay very close attention, [11:42.760 --> 11:47.480] because of course, you also translate variable names if you have source code in the document. [11:47.480 --> 11:53.400] So, you need to pay attention that the output is still coherent, okay? And of course, as [11:53.400 --> 11:59.680] you need to do this manual work, you can use a PO edit tool or any other tool that you [11:59.680 --> 12:05.920] are used to use to work with PO files. So, here we have the English version and there [12:05.920 --> 12:11.560] we have the Brazilian Portuguese version. I could just step item by item and reveal [12:11.560 --> 12:19.680] these translations until I was satisfied with the result and then you can simply regenerate [12:19.680 --> 12:26.600] the markdown file from the PO file, okay? So, we started with the markdown source code, [12:26.600 --> 12:34.800] we tracked using AMD PO to create the initial PO file. I ran the script that sends the untranslated [12:34.800 --> 12:41.080] strings to AWS translate, but you can use any provider you want. You review with PO edit [12:41.080 --> 12:47.800] and you do the opposite conversion from PO file to markdown and publish the wiki, okay? [12:47.800 --> 12:54.480] So, this is the workflow I tried to implement using my collection of scripts or hacks. It [12:54.480 --> 13:04.560] is not really a tool, but with the intention to facilitate a single markdown file translation, [13:04.560 --> 13:11.840] okay? So, the document looks like this. This is the English version. Yes, I put the English, [13:11.840 --> 13:18.040] no, this is the Portuguese one, okay? So, in the end, I could publish this document in [13:18.040 --> 13:24.480] GitHub. It's not yet fully integrated with the GitHub wiki because ideally, I should [13:24.480 --> 13:32.000] put the GitHub wiki of this documentation as a sub module of my project. So, when I [13:32.000 --> 13:38.800] updated it, I also get the newest version of the markdown and if I do this kind of integration [13:38.800 --> 13:43.680] using GitHub, you'll be able to publish the markdown file also using GitHub. For this [13:43.680 --> 13:48.800] initial version, I just went to the editor and I paste the markdown file, but it's possible [13:48.800 --> 13:56.640] to do the integration. It's the next step that I still have to do. So, I published the scripts [13:56.640 --> 14:03.320] in this GitHub. So, it's an info Python programmer. It's useful for any markdown file. You may [14:03.320 --> 14:12.680] want to apply the same workflow. And the page is in beta because I asked all the translators, [14:12.680 --> 14:18.480] all the people that can read Brazilian Portuguese to check if everything is fine. Because the [14:18.480 --> 14:25.600] main goal to have a process is that usually, you never do the translation a single time. [14:25.600 --> 14:30.480] The translation is something that you need to keep alive. As soon as the English version [14:30.480 --> 14:36.480] is extended, translated, updated, you have to do the same thing in Portuguese. If you [14:36.480 --> 14:42.360] don't have an automatic process able to license this document and present a subset of changes, [14:42.360 --> 14:48.040] you'll be obliged to review a full document and this can be very, very cumbersome over [14:48.040 --> 14:56.480] time if the document has 15, 20 pages. So, it's not ideal. And another advantage is that [14:56.480 --> 15:03.200] the tool is smart enough to detect repetitions of the same stream. So, you also, you don't [15:03.200 --> 15:09.360] have the boring work to re-translate the same text multiple times. This also saves a lot [15:09.360 --> 15:20.160] of time. Yes. So, these are the main findings and the main problems I try to solve. And [15:20.160 --> 15:23.160] that's it if you have any questions. [15:23.160 --> 15:45.080] You can say. My phrase. Yes. So, it's much easier and especially if your text has source [15:45.080 --> 15:50.200] code because the challenge was the source code. And sometimes you have to keep the indentation [15:50.200 --> 15:57.040] and so on and so on. You don't want to pass the indentation mess to the translator. So, [15:57.040 --> 16:02.200] if you use a tool like this, it will structure just the part of the program with text. And [16:02.200 --> 16:08.400] if you keep the white space, which is very important in Python, so you can translate. [16:08.400 --> 16:12.880] But you still have to pay attention because of the automatic translation to translate everything [16:12.880 --> 16:20.800] to the target language. So, you have to revert the keywords. But at least the generation [16:20.800 --> 16:25.360] is quite strong. So, when you have generated this, the program is still a correct program [16:25.360 --> 16:30.360] in the end. So, it's good. [16:30.360 --> 16:46.000] Okay. I think it's the last one. Okay. Thank you.