Language barrier and open source.
Here's my biology. I'm an open source advocate, educated, in-through asset, e-commerce and
e-commerce practitioner. In recent years, last year, I switched my direction to the
e-commerce. I'm also a patch open meetings contributor. I joined the project in 2012,
so I got about many years experience. I'm also an interpreter and translator for some
largest China's open source communities. I'm a member of Microsoft for startup founders
in China. Also, I'm a founder of omoforce.com and www.omoforce.com. Last year, I was very
honored to be a patchy community coach and a Force Asia conference speaker.
So today, first, I'm going to introduce a little bit about my translation experience.
First thing, I'll share you some experience and lessons we learned from the New York's
open source Congress report, 2023, and open source initiatives, deep dive AS series, which
those two projects we finished last year. Then I'll talk a little bit about the team
build strategy. Finally, I want to show you a patch open meetings website translation.
So here comes to my translation experience. Back two months ago, 2002, when I worked in
college, I taught course like in ABUK courses. For example, information system analysis and
design and also data structure. We used English textbooks for students, but to help students
to better understand the concept, we use Chinese translation course materials.
And then late in year 2011, I was Oracle offshore outsourcing project manager in China for
an Alcon logic development, Alcon logic clinical trial study management platform. I was a project
manager. We translate like a source, emails, testing, package, delivery, everything. So
for one year, then I joined a patch open meetings team, I translated a website into Chinese
and promote the website, promote the project in China. In recent years, most of my translation
experience related to Kai Yuan Shou. That's probably somebody you never heard of, but
in China is a famous open source community. We have each year, we have open source conference
in China called Coastcom. And we got a lot of communication and cooperation with the
Patch and the Neelix Foundation. So in year 2018, I was an interpreter for the Patch and
the Sofa Foundation speakers for Coastcom, China. And in 2021, I gave my first talk in
how to integrate AFDAP and AD with the Patch Open Meetings. In 2022, I was a translator
for China Open Source Report. And last year, I mentioned before, right, in FOSA Asia and
the Community Code over China for the role-based access control mechanism by unifying AD, Neelix
and Patch Open Meetings. So in English session and the Chinese session. For the 2023 China
Open Source Report, I was a translator. And last year, around November and December, I
was involved in Open Source Congress Report reviewer and also a translator and a reviewer
for OSI Deep Dive AS sessions, which is a video captioning project. So here I just got some
images, photos. This is my first visit to Europe. This is my second time. And this is
Coastcom 2018 in China. Those speakers from Patch. The interesting thing is, after five
years, last year in Beijing, I fortunately met these two gentlemen again five years later.
We made friends, right? But you can see the change after five years, all hair turned gray.
Okay, this is the talk in the virtual, my first talk in the Open Source community,
how to integrate FDAP and Active Directory with Patch Open Meetings. At that time, I was the
personal former member of Kaiyuan City. This is last year, the FOSA Asia. We got both here
right at the second floor and that's in Singapore. We have a group of Chinese developers from
different cities in China like Shanghai, Beijing, Chengdu, Changsha. And also is a Japanese guy
from Japan. We have a second group to the meeting. This is last year, a committee of
code in China, in Beijing. See many Chinese developers, commuters. And I also gave a talk
in the integration check. For Open Source Congress report, 2023, the title of the report is
Standing Together on Shared Challenges. Report on the Open Source Congress. Probably somebody
here, maybe you read the English or other versions reported, right? But in China, we are absolutely
very honored to be part of the group. We do the translation work. The project created on November
17th, project complete on January 12th, 2024. The whole report length is 34 pages. We got six
translators, two reviewers. We divide the task into six subtexts, including infographics. We use AI
assistant such as DPL, CloudIn, ChagBt, Co-Panel, Google Documents to help us to facilitate
translation. Usually, we use like a Google, Google translate to do translation. But when we
compare the result, we find that Google translate is not accurate as DPL. So we just use the AI
tool to help us. We labeled the whole translation process in different stages, like in-process,
like translated, like reviewed. So each team member, okay, each team member, you can look
at the labels, okay, got it. If the former process finishes, then I can start my process. The initial
translation finished on December 2nd. And then we start our first round of QA and the second round
of QA and final review. So left side, the English version front page, and the right side is Chinese
version front page. Here is six subtexts, and here is my nickname. So I reviewed the two parts.
So for the infographics, we know in the translation, for text translation, it's
comparably easy, right? But for the image translation, we need to use more time. For example, we speed
the infographics, there's 12 images, we speed it and get one of them, and we use the Google
translator to get the Chinese version. But you can see the title and the content is unreadable,
right? It's too small. So we just use, clean everything out and use Photoshop to add the content.
So finally, we combined the 12 images together to get the infographics.
Can you see? Sorry, it's not very clearly. Left side is the content of English version,
and the right side is the content table in Chinese version. You can see the page number here.
The English version is 28 pages. In Chinese version, it's 20 pages, which means after the
translation, the five sides shrink. Just I only got two thirds of the source file.
So here comes the lessons I learned. Original paper has some quotes, which some experts,
some leaders, they say something, right? But the translation has some discrepancies, so
we need a final review. After we send the paper to the latest foundation, they say, okay,
we got some discrepancies. You guys need to pay attention. You need to have a final review.
So when we check these discrepancies, we need to keep constantly looking at the content
discrepancies. We need to keep consistency and one format, otherwise it would cause confusion.
Because we are sending tasks based on the source file in Google documents, but the discrepancies
exist in the target PDF files we send to the latest foundation, right? Okay, then we identify
how many book codes and in which part of the paper so the corresponding translators will be
able to check the codes. Because of the five sides shrink adjustment, right? And in PDF format,
we got two column format. So talk to the file page number, it's different from source file
page number. So if we didn't keep consistency on source file, then we got confused. This
happened in the communication process. Okay, we found this postcode. Who did this? Nobody
knows because when we check the PDF format, I didn't do this, I didn't do this, right? So the final
solution is the leader, the leader of the project, he took all the responsibility. He checked all
the protocols and the translated by himself. After we send the paper to the latest foundation,
they used us some badges for just four. I'm very happy to get this for the recognition.
For Year 2023 OSI DeepDive AI sessions, this is a video caption project. Project created on
November 7th and the project will be completed on February 14th, which means we're still in the
process. Total, there are 17 sessions. We had 10 translators in two groups. Only have two
reviews. So we follow the steps. First, we use the raw video material to get the scripts. And then
we check each word and the sentences by translating ears. We also use like TPL,
charge PTE, quite easy, GNY. This is a Chinese automatic translation platform and YouTube to
help us. We also put the status in translation translated in review under reviewed to identify
the progress of each process. The initial translation finished on December 2nd. Then we
immediately started the first round QA, round QA and final review. Then we have the second group to
help us to publish in different social media like in WeChat as a web pop in China and also in
Facebook and Twitter and YouTube. Sorry, it's too small. See the 17 sessions and we are forcited
the publishing process. We subgrouped the 17 sessions into different groups. We got three
groups. First one is open risk and challenge. And the second one is governance. And the last one
is the fireside track. So for each group, we also got some subgroups. For the year about which
are the sessions, I either reviewed or be a translator. I am the reviewer. So you can see it's
almost half of the sessions. So this is project management system in Chinese. That's my role.
So you can see the yellow bar is my role. And here the arrows pointed to the different process.
First one that means reviewed. And this one means in translation. So we can identify the process
because the team members, we will work together and work on different cities.
Okay, here comes the lessons I learned.
We need to check subtitles versus warriors to make sure every word matched. It is very time consuming.
For the video session, usually like if it is 30 minute session, we usually take about
10 or more than 10 times time for a 30 minute session. We need like 6 or 7 hours for the
check. Because subtitle has 120 character length restriction for bad audience sense of reading.
So we need to split every sentence, longer than 120 characters into small parts and also split
the subtitle time frame. The split caused some subtitles didn't show up because the time frame
didn't match. So we need to listen every word very carefully and adjust the corresponding time frame
into secondary seconds, which is 100 seconds. So it's a time consuming. Especially when some
adjectives and longer sentences come up, we need to adjust the front and back word time frames
next to get perfect effects.
Here are some links of some published sessions. See Facebook, Twitter and let's we check.
Team build the strategy. Right now we have more than 50 translators. Most of them come from
universities, some students in Europe. So team leaders recruit some new translators from various
sources, companies, universities. Then new translators will introduce to team members.
Every member assigns the workload by identifying tasks by one theory. So you can just okay when
the project comes up, you just pick whatever task you want to get involved with. Then we have basic
benchmark scoring system, student trial period to record every member's performance.
For the benefits, we each code tokens for technique on the long technical
contributors. For those team members who have good performance, we grant them the
privilege to use community resources such as DPL and the crowdfunding platform.
Community also provided the chances to team members as a voluntary for each year's
conference. And the students can also add the voluntary experience to resume.
So here is the code K token system. But that's in Chinese.
Upside is for technical translators. Low side is for non-technical translators. We got like
ABC three different kinds of tokens. You use the tokens, you can
to cooperate with open source community for different kinds of like Hexen,
like different kinds of events. You can join the events.
Here is the basic benchmark system. We have a different contribution type like
principal, helper, consultant, informer. For each different role, we have basic score weight.
So the benchmark score value is 10. So after you just use the score value 10 times by
score weight, you get the basic score for different roles. Then here is the task management system
for the team members you join the project. We use the system to calculate the contribution score.
Finally, we sum up all the scores of your contributions. You can see on the top one is
my name. I got 51. So they grant me some privilege. I can use the crowdfunding, I can use DPL.
Some AI translation tools help me.
Okay, here's my Apache Open Meetings Translation.
Apache Open Meetings, may I ask, have anybody heard about the project or used the project before?
Never have I heard? Okay, this online video conference system. Originally, it is
from European. It is Apache project. It is only fully web browser based open source video
conference system. No need to download apps, no need client-side installation. You can
create a middle server for remote collaboration. The server can be installed either locally or
via container. But I recommend use locally because if you're not very familiar with container,
in the installation or in the configuration process, probably you've got to trouble, right?
Sometimes, you know, if you do not do the commit, you probably will lose the original configuration data.
Usually, we need to install the server behind the term server for NAT configuration
for the whole function. Otherwise, you cannot use system. The system supports multi-language.
The latest version now is 7.2.0. It supports 39 different languages.
So, for the translation website, we cannot use just like Google Translators. You can
use Google Translators for static website. But that's interactive, right? It's an interactive
website. We've got actions. We've got scripts. So, you can only use the original, use the building
framework to change the language. So, you need to change the labels and text strings. All language
strings should be localized and stored in the language section. You have full future language
editor with every installation of open meetings. So, you need to check out the language editor
to look up the label IDs in the GUI. Just you need to run the open meetings client with the debug
model. You cannot use the deploying model. The way every text string has label ID in
places additionally in the text field. Later on, I'm going to show you the difference between the
deploying model and the debug model. Sorry, this one. So, the upside image is the original
language, this configuration file in Chinese. So, okay, when I first started,
this is the one issue I found. I sent the issue to the JIRA for Apache. So, I said, okay, this
configuration file didn't work. Some labels didn't get translated. They said, okay, you can do
translation and you can send the file to the JIRA and we're going to update the source.
See, can you see the difference between the first image and the second image? There is one
actual space, right? You can see here, braces, the number braces, but upside means braces, space,
braces. So, they used the file compare, found this problem and send emails to the group,
to the email list. There is an issue in Chinese translation, actual space characters. So, that's
not the lesson I learned, which means for the translation file, you cannot change anything
except the translated content. You cannot even add one more space.
Okay, I almost finished my presentation and there is some time left and I'm going to show
very quickly about the debug model and the debug model. So, here is my contact information and my
emails. This is my website. Okay, excuse me.
I used that mean log in system.
This is already Chinese version because we set the added profile here. I already set the
language in Chinese, set time zone in Asian Singapore, but here you cannot see every
label. There is no label ID, right? No label ID. So, how can I bring out the label ID? Because
if you want to change, if you have no label ID, okay, let me bring back to English so you can read it.
So,
current password request.
Okay, we'll change.
Every time you change the language configuration,
it will take effect immediately after you log in and log out again.
So now you can see the interface already changed to English, right?
But for each label, like if I wanted to do translation,
so if you don't know the label ID, you need the Geltron Language Editor,
you need find the corresponding language file,
which is...
We got a language configuration file here.
So you can see total is 39 different language files, right?
For example, for Chinese, that's 11.
And also you can change it to French or to Dutch.
So for example, if I want to change like a project website, right?
If you want to rebranding or to do something,
if you want to change this one, if you have no label ID,
you got to remember the text ID.
So go to the language editor, go to the...
Then you need the remember.
I remember because I test many times, I remember the label ID.
It's 282, so you can change it here, right?
Okay, then let's see the deploying model.
Oh, so...
There is a configuration file, WIPER.xml here,
so you need to go to your WIPER.xml,
then you search the deploying model.
You can see that's a deploying model.
You need to change the deploying to the development model.
To deploy model, now the added command...
Oh, sorry, small case.
Okay.
Now you can see we change it, right?
Change and then...
Come out, then restart the server.
Okay.
We restart the server, then...
I need to...
I need to make sure the server is start up.
Okay, so we start.
Then I used admin to login again.
You only need to be slow.
Let's sort install the virtual machine,
so all the memory is already exhausted, almost exhausted,
so you only need to be slow, sorry.
Do the thing.
Sorry.
Some bad things happened.
Okay, now you can see this is in the DIPWIPER model.
See, from here you can see each text label,
you got a label ID, right?
So if you want to change the label,
you just go to your language editor,
find the label ID number, change it,
then you can get a different version.
Okay, thanks.
I finished my presentation.
If you got any questions, I would be more than happy to answer.
Thank you.
So translating the open source report,
what was the format of the files that you used as a translator?
Was it Google Docs, was it PO,
was it actually for what you used in which format?
For the source file, we upload the file to the Google Documents,
and each team members, if you want to identify which part
you want to translate, you just label,
select text and say, okay, this part I'm going to do it.
So the translator works directly in the Google Docs?
Yeah, yeah, yeah.
Translate it, we got no problem to access the Google Documents
because almost everybody we use the VPN, you know.
Good question.
Thanks.
You have some way to detect space issues
or using the maybe double width character
instead of the format character, the spaces?
Yeah, for the spaces,
I'm not 100% sure how the project leader he found the spaces,
but I know some guys, they use the file compression.
Like in VSS Studio, there is a file compression function.
You can use the file compression to compare the source file
and the translation file to find the extra space.
Yeah, I don't appreciate it.
Do you use translation memory to speed up your translation process?
Excuse me?
Can you repeat the question again?
Do you use translation memory to speed up your translation process?
When you have a good question,
I guess the translation memory is the crowding function.
Crowding, you use such kind of translation memory, right?
Yeah, we use translation memory.
That's the building function.
And we also use machine translation,
which is AI tools such as child GPT or Microsoft translator,
which do the integration with the platform.
Thank you.
Thank you very much.
Thanks.
Thank you.
Thank you.