So our next speaker is a very good friend of mine.
I call him the big boss because he's a co-chair of the TC39, which is the committee that manages
the JavaScript specs.
So he's going to talk about message format, the future of internationalization.
I hope I will say it right.
And big round of applause for Rojoa.
Let's go.
Thank you, Aiman, for the gracious introduction.
Major thanks to Leo for setting up the stage for me.
There's no pressure there at all.
Well, I thought a lot about this. It's very anxiety inducing to follow up to a talk like that.
But I realized that it's nothing to be anxious about because why not follow up that very intense technical talk with something simpler?
So I try to talk to you.
It's mostly propaganda. It's a program that I need to induce you to.
But yeah, I hope you're ready.
Everybody ready? Let's start.
Okay, great. So welcome.
A little bit about me before we begin because that's obviously the most important part of the presentation.
I am Ujwal. You might remember me from my username on the internet.
It's not easier, but it's fine.
I'm from New Delhi in India.
And I live in Akharunya.
It's a beautiful little city.
I would describe myself as well.
Zellet may be a strong word, but trust me, I care a lot about open source software and believe in the web,
which is what I'm here to sort of make you fanatisized about as well.
But I suppose, given that this is the JavaScript room, you all share a lot of these ideas as well.
I love dogs and massage video games that hurt me psychologically.
And I work at Agalia.
So quick show of hands. How many of you know about Agalia?
Wow. Only at this conference. Thank you.
At Agalia, well, this is me trying to read a newspaper.
Agalia is an open source consultancy.
If you don't know about us, we are also a worker-owned cooperative.
So that's neat.
We've worked across a lot of open source projects and ecosystems,
mostly around low-level software like the Linux kernel,
different things in the Linux user space,
and in the multimedia space and graphics and so on.
You might know about some of our contributions in the web platform.
A lot of the browser projects have a lot of interesting work that has been put in by my colleagues,
which I'm very proud about.
And lastly, but not the least, I suppose, we work in the compiler space,
which is all about programming language design, right?
It includes WebAssembly and JavaScript, but also stuff on LVM.
And that's what I need to talk to you about, about JavaScript, about TC39, as he said,
which is a very descriptive name for an organization.
Do any of you know about TC39? What does it do?
Wow. That is surprising to me.
But TC39, long story short, designs and develops this programming language called JavaScript that we all love, I hope,
to some extent. It's a complicated subject.
What would you describe JavaScript? What do you think of JavaScript?
Do you have a definition for JavaScript?
We know what JavaScript is. I would like to describe it, hopefully, non-controversially,
as a general purpose programming language, and now general purpose programming language,
that is designed primarily for scripting web interfaces.
Are you web developers?
Are there any non-web developers that use JavaScript?
No. See?
Like, while JavaScript is a growing language ecosystem, it is still very heavily influenced by the web ecosystem,
and it owes a lot to the web ecosystem for making it what it is, and therefore,
as the primary language for making web interfaces, it needs a lot of tools that are required for web interfaces.
But what is the web?
I couldn't even find a logo or an image to describe the web, and I feel that the web is such a weird concept
because we all know what the web is, right?
But could any of you define what the web is?
So I do feel that the web is hard to define or really put a finger on because it's gone through so much.
I always rely on my good friend, Baudrillard, for explaining what the web is.
So we have, for example, web as we know it, and then it just keeps what?
And now it's everywhere. The web is in your gaming consoles, it's in your cars, in your toasters, probably.
Does it need the web? Probably not, but I don't know. I don't trust toasters.
But the point is that over the last couple of decades, the web has more or less emerged as the main platform
for designing interfaces for people.
This is not a controversial statement, I hope.
But what is the web platform that I'm talking about?
So to define the web platform, I would say that it's an interactive, decentralized communication platform at the scale,
but that makes no sense.
So the web is a standard platform for making widely accessible.
They're deployed at the scale of everyone and rich user interfaces.
We are not in the early days of the web. The interfaces that we use on the web have changed substantially.
But this ambition of universality, universe, yeah, that's the, okay.
Yeah, this ambition is built into the web and it's ethos, right?
It is supposed to be for everyone.
It is supposed to be accessible by a vast group of people.
And any platform with such ambitions is not needed, but I would argue required to be accessible and internationalizable
and localizable because how else do you reach the anywhere near like the target audience of the web, right?
So the web has a responsibility as this platform.
A quick note before we start, I've already started using these terms,
but if you're unfamiliar in a nationalization, basically is the process of making an interface in a way that it can be localized
into various different languages, cultures to suit your users.
And as I mentioned, localization.
Localization is the specific act of, you know, modifying your interface to suit a target audience.
This is just a primer to, because I'm going to use this a lot.
I hope that was obvious.
But let's talk about the early web, the early interfaces that started thinking about internationalization.
Because as much as I would like to talk about how the web has revolutionized internationalization, it doesn't start there.
The story starts in the beginning because user interfaces were everywhere.
Like people who were writing text based video games and C were also very keen on things like internationalization.
UIs are composed of string content.
These content, like these strings are what we are referring to messages.
So when I say formatting a message, that is what I mean.
Manual localization is a way to do this.
I'm sure everybody is familiar with Wikipedia.
It's kind of possible to have a website work in that way.
But it was quickly proving to be unmanageable.
Not only because it's hard to do it manually, it is.
But also that it represents a very slim vision of what it means, what true internationalization means.
Changing from a language to another doesn't actually mean internationalization.
There's so many different axes within any given country, within any given language.
There's so many different ways of expressing things that to reduce internationalization to merely catering to languages or locales is, well, locales can be complicated.
But merely focused on languages, it's too simplistic.
So the actual diversity of locales can never be catered to using this simple approach.
Imagine having a different website, having a different version of your website for every currency that you support.
And basically to promote a better, cleaner, more modular approach for building interfaces that can be localized,
C-Hackers first came up with Get Text.
Have any of you ever used Get Text?
It's everywhere, right?
In all of your operating systems, it's actually part of Lib C, so basically everywhere.
But let's talk about Get Text. What was it?
And I know that we are in the JavaScript room, but bear with me.
It all makes sense, hopefully.
Get Text was one of the two main internationalization systems that the early hackers cooked up.
The other was CAD-GETs, but the presentation can only be so long.
So let's not get into that.
But it was the dominant system over time.
CAD-GETs, as you might know, now not used anymore.
And it was not standardized.
We're going to be talking a lot about standards here and how they enable people to build stuff in a way that is reliable across boundaries.
But Get Text was never standardized.
It was, however, standardized in a de-factor way by programmers using it across their tooling and basically through documentation and education and so on.
But its adoption by Sun and then GNU in like G-Lib C basically made it so popular that it was standardized through popularization, for instance.
It was not as powerful as the internationalization systems we would use today.
It mainly dealt with very static strings, and you could replace them with different strings and different languages.
So already it's not perfect, right?
But it was good for the time.
It was what we had.
It was better than nothing.
But what it did was it went on to inspire an entire generation of applications, of interfaces that were not only utilizing internationalization but were built with internationalization in mind.
And one topic that I'm going to, this is one topic that I'm going to keep returning to throughout the presentation.
Giving power to your users is by far the easiest way, in my opinion, to, well, people do wacky things.
That's the web for you.
But it's the most interesting way, in my opinion, to allow people to innovate and come up with completely new paradigms that were unimaginable.
And we'll see how Get Text inspired these things in its own right.
So there's Python's Get Text.
It's basically Get Text, but in Python, Python likes to keep things simple, I guess, in some way.
Java introduced, however, for the first time, the concept of message format.
So the context here, like you might also already remember, well, it's in the same slide.
And the micro systems already big in this idea of interfaces in internationalization, of basically deploying their products and their users products across the market, so to say,
was very keen on internationalization.
And therefore, they basically, through acquisition and innovation, created the first sort of beginnings of message format in Java.
At the time, to be this like, it's so funny to think about it.
Java was, to that generation, what the web is to us.
Java was a cross-platform way of building interfaces that can be then deployed to a massive audience.
Minecraft, I guess, if it's the bill.
ICU then, however, picked up message format.
ICU was also formed by these organizations working closely together, but it was a much bigger effort.
It was standardized, and it was developed in a way that was general enough for everyone to use and integrate in their apps.
So ICU creates message format.
It was originally a very close sort of copy, is maybe a bad word, but a very close imitation of the Java message format.
But in its own right, it added more features and more power.
And we can see how that affects things very soon.
But here's a quick and dirty example of how that works.
So you have this message, and it has an expression inside of it, right?
And this expression is basically selecting for a number.
So if there are no files, well, if there are zero files, you can say there are no files.
And there's one file, and then anything before above one is, you know, there are X files.
But this needs to be, this is just one message in one language.
It needs to be translated into various languages.
But finally, we had a format to express all of this, right?
If you're writing in Arabic, for example, you need a lot more.
In Japanese, you need just one.
But, yeah, so if you think about this, ICU's message format not only subsumed the original Java message format that it built out of,
but it added more, and it experimented in its own sample space.
But some important things was that ICU message format, because it was separated from Java,
it was able to rethink some of the design details about some of the processes that happened.
And it was designed for the first time almost with massive feedback from implementers and translators at the forefront.
Translators being a key word here, because I think it's not very controversial when I say,
and it's kind of obvious from the documentation even to this day, that tools like GetText were primarily designed keeping programmers in mind.
And finally, we have a lot of organizations who are very invested in translating their interfaces and their translators being part of this process of designing this format.
So things were really starting to pick up, at least I would feel so.
What it ended up with is a much better or, well, a much more powerful syntax,
and that, as I mentioned previously, just opens up the space for innovation as it did.
So here's a quick example. It looks huge, but please bear with me.
But basically, here we have a simple message.
Whoops.
Hello?
Is, okay.
But yeah, you're, one of the things, okay, I'm going to go back.
But one thing that I didn't focus on much was that ICU message format at the time allowed nesting.
So messages could be nested, and this, as you might see here, opens you up to new use cases that you didn't think of before.
So here we have like a nested select statement of sorts.
And what we're doing is we're setting for the pronouns of the host who's hosting a party and the number of guests that we have to display a simple message.
Well, simple at the end of it, but that's the amount of work that goes into making sure that you have a message that works for all the combinations.
Right?
And this was great.
People were now doing more powerful things than they ever did.
But what was going on with the web?
So talking about the good old days of the web, whatever that means.
The early days of the web were very dominated by either absolutely static documents or very static crowd apps, basically.
What does that mean?
It's basically about some simple operations.
If you think about it early internet applications, so to say, even Twitter, for example, were built as very simple crowd apps.
There were very few operations that you could do there, right?
And if you compare that with anything we ever do now, it is a completely different landscape.
I mean, even just Twitter and how it works now.
So the early websites, because they didn't have a very powerful medium on the front end, they were like almost entirely on the server run times for their content, for whatever kind of dynamic things that they did.
And guess what?
For internationalization.
So while the early web was not as powerful as it is now, people were still, there was a lot of appetite for internationalization and people did what they could.
So they used Java's message format using Java text.
Java.tex has a number of other APIs as well.
PHP had an Intel object, if you think the JavaScript was the first to do internationalization.
That is not the case.
And Ruby on Rails also had this IETN.
I don't know why I keep mentioning this.
It's just one that is closest to my heart, I feel, because that's what introduced me personally to internationalization, how it's important, and how it can really bring life, spell life into your interfaces.
And this allowed the early web developers to tinker with things like message formatting and more complicated internationalization use cases.
So there was a lot of tinkering happening, but basically with the combination of this message formatting style that we talked about, the popularization of templating languages.
You have to remember this is the time when templating languages were really taking off and HTTP content negotiation, which meant that the client would tell basically what languages the client accepts to the server and then the server could localize based on that.
We reached not a great point, but a very important point for internationalization on the web because now things are starting to get serious.
So we have a platform for building interfaces.
It is being used by a vast community of people who are building websites, and they are utilizing internationalization techniques to make their content more accessible to a wide range of users.
What about the Mpesky.js developers, though?
What are they up to?
Up to no good, I hope.
But no, jokes apart, JavaScript also had its own parallel development during this time.
But basically, for reasons that are not worth getting into in this presentation, JavaScript remained mostly not very popular.
I mean, for a long time, people were hesitant to really do a lot in JavaScript, and it was not their fault.
JavaScript was not as powerful as it is now, but things, again, on the theme of somebody giving power to a bunch of users and them taking it across the board.
jQuery released in 2006.
So that's almost two decades now.
And how far have we come from there?
It just, it sparked a whole sort of space of people experimenting and building more and more interactive web pages, basically.
Fast forward to some years, we have React and the age of SPAs is dawning on us.
Some of the most annoying websites kind of were created during this time, but all in sort of good time, and it all led up to something important,
which is that now we have a very dynamic, a very expressive web that people interact with in very different ways than they did with the early web.
Some of the interactions that we have with websites today are just, they probably weren't something the early designers of the web had any sort of idea they were enabling, maybe for good reason.
But TC39, which, as we mentioned, is the standards body that is tasked with designing the language recognized at the time that internationalization is a growing concern for us,
and we need to enable JavaScript developers to make the best use of this area and the techniques with it.
They formalized a task group, task group two in this case, to work on internationalization.
So a lot of internationalization features were developed and deployed to the web.
As of this day, you may know about the Intel object in JavaScript and the various formatters and other things that it has.
But basically modern JavaScript interfaces use these like in line, well, not in line entirely in the context of interfaces, but basically they use these APIs to internationalize,
to localize for that context on the client and then sprinkle them across the interface, but it's not perfect, or is it?
Well, the state of internationalization on the web is that outside of what was being done in JavaScript, most of the work and effort in terms of internationalization was directed at very specific things,
like supporting more writing systems, which is great. We need to support all the writing systems, but it was very limited in its scope and ambition.
On the other hand, internationalization grew and have now so many different features just to talk about them formatters.
So you can format all these different types of information like numbers, including currencies and different kinds of numbers in different formats.
You have dates and times that can be formatted in various different ways in so many things.
You have collation, segmentation of text, which is something that is now supported by the interl object, and then there is some selection.
So there's not a whole lot of selectors that are available to this date, but we have plural and ordinal selection.
So if you have a number, you can basically select for the plural rules of any given language, which was cool, which has a lot of the building blocks that we need, but we're still missing important pieces.
So talking about the timeline of where we end up with message format 2, the JavaScript group that was designing the internationalization API essentially realized they needed something of the sort.
It was discussed over the course of many years, as you can see. It was iterated upon, but there was a general agreement among this group that message formatting on the way.
The web is something that is both needed and necessary step to enable further use cases.
Do you remember this slide? Like, this is not a great format, but that's what we had.
So message format and the syntax and all the details of it were standardized 20 years ago at this point, and it is not sufficient for the modern dynamic interfaces that we built that JavaScript has enabled now.
So why do we need a new thing? Well, first of all, as I just mentioned, we have so many more ways of interacting with our interfaces that just require
new tooling, rethinking how the fundamentals work and the sort of outdated, outmoded tools don't exactly fit the bill.
They are too imperative. They are too static to actually support our new use cases.
There's also a sad lack of modularity and extensibility in the existing message format a bit more on that later.
And because it's a standard that is designed for being accessible to basically everyone who uses ICU, which by the way is a great thing,
the unfortunate side effect of that is that you can't really deprecate anything.
You can't just clean it up and move on without making a breaking change that would annoy every user.
So there needed to be kind of like there was Python 3, I don't know, that's controversial, right?
But we needed a heartbreak, a message format 2 that would be designed from the ground up to deal with some of these things.
And the diversity of locales that we know now makes it basically impossible to map any localization structures one to one.
A great example of that is one that we just used in the past, like the number, the plural selection example.
Do you remember that? Yeah.
English has two plural rules or modes, which is that either things are singular or plural. Zero is plural, for example.
And that's simple and that works for us.
Well, Welsh, a language that is not so far, you would assume, has five.
Arabic has six, I think. I'm sorry if I'm...
But you get the point, right? Japanese has one.
When it comes to any of these things, you cannot map messages one to one.
You need something that is more expressive than that.
And basically, like the design constraints of the old API made it very limited for the modern JavaScript ecosystem that we have.
So not only do we need to take everything that the original message format did right, do it better, if possible.
We need to also accommodate a lot of the innovation that was now happening outside of the standard space, right?
In proprietary tooling or elsewhere.
So we needed to really get our act together and that was message format two.
So I am going to start on a quick and clumsy, as I said, intro of message format two.
But are you ready? Okay.
Context setting done? Yeah.
So for context... Well, okay, more context.
But a dynamic message string is the string that is not just a static string,
but as we perceive and use strings on the modern web, they can change.
They can morph around each other.
It's ridiculous, honestly.
But the goal of message format two is to enable a lot of these complex use cases that we haven't covered now while keeping the basics simple.
Because at the same time, a very important goal is to lower the bar to make it easier for anyone who at the moment,
and I'm not saying incorrectly, that they rightly consider message formatting to be a complex thing to integrate.
But lowering the bar would allow more people to experiment with it and to basically think of new solutions for their new innovative problems.
So, for example, it's text mode first, which means that for very simple messages, it's very simple to get into it.
At the same time, it allows a lot of complex messages and expressions, which we'll get to.
It also allows you to have declarations and annotations.
So basically, we're getting dangerously into programming territory here.
You can have variables and functions being called.
And it's great because you have this great deal of expressivity that you didn't have before, right?
And finally, talking of functions, there is the idea of extensibility and modularity, right?
So now we have the concept of a function registry, which you can add to and a bunch of built-in functions that you can easily utilize for common use cases.
So yeah, quickly talking about the kind of messages that we have, we have simple messages, which is, well, a message.
We can have expressions where you can interpolate variables kind of like you might be familiar with in JavaScript, but yeah, different.
Same but different.
And then we have complex messages like selectors here where you can actually match for a particular value, kind of like a switch case statement.
So this is the various kind of messages we have.
Then as I mentioned, you can call functions.
So in this case, we're calling a formatter function to format a date, and then it's part of a larger message.
There's support for markup elements.
So if you have some messages that have markup built into them, simple example would be text sort of base markup elements like bold and italic.
More complicated things could be like this.
And then there's support for declarations.
So you can have variables and play around with them.
But there's more.
There is an extensible function registry apparently, which is a great thing.
You can have your own functions.
You can have private use annotations as well.
And there's support for popular built-in formatters and selectors, kind of like we did already in the JavaScript API, but specifically built into message format.
So this means date and time, possibly durations.
It's kind of not settled yet.
There's support for formatting numbers and integers and selecting for them and matching them.
And plural and ordinal rules are on the table and possibly lists, formatting complex heterogeneous lists of objects.
But yeah, as I mentioned, this is still something that is up in the air.
This is one of the final pieces of the puzzle that we are yet to completely sort of put in the puzzle, I guess.
How do you do that?
But the point is that this needs to be settled.
This needs feedback.
This needs more data than we already have because we know what is generally useful.
But what mostly helps is getting actual data from people who use these things or might want to use these things, but things that matter to them.
So that was part of the shtick.
But basically, do you remember that slide?
It's still here.
This is how it looks now.
So you can have a more complex matcher.
And now it's matching for both the guest count and the pronouns of the host.
And yeah, you have an expression.
It's basically achieving the same thing, but with a cleaner syntax and hopefully something more manageable than what we had before.
Why does any of this matter, you may ask?
Where did I start with?
I kind of lost context.
But as we talked in the beginning, UI design has evolved a lot since the beginning of UIs.
And the web platform was developed for something but ended up becoming essentially the most reliable standardized way for deploying user interfaces.
But a lot has happened since then.
For example, the UI space has done a lot of innovation around internationalization about making more localizable UIs in a cleaner way,
in a way that helps programmers and translators and everyone else in essentially message formatting.
On the other hand, the web platform has evolved substantially with JavaScript.
Things are very different from what they were.
To bridge this gap, however, to fill the final piece of the puzzle, as I mentioned before, we need Intel message format.
Because we have developed a lot in the web and interfaces are more complex and more dynamic than ever.
At the same time, we have better tooling in every way when it concerns internationalization.
But these two spaces have not yet benefited completely from the innovation in each other.
So this is the idea.
Not only is message format to built on top of a lot of the innovation that has happened within JavaScript,
JavaScript is now sort of importing years of work that has gone into the internationalization space.
So what is the message format from internationalization?
That's where we were supposedly starting but it got lost somehow.
After talking about it in Unicode and sort of coming to this format,
we finally brought Intel message format back to the committee and it is now a proper proposal.
So it's at stage one and hopefully it would reach stage two soon and could get deployed to various browsing engines and non-browser engines.
But it is built on top of the things that we know and that we have discovered around familiar patterns that internationalization built-ins use.
For instance, format two parts is this whole thing that we do in internationalization in JavaScript to allow people to essentially have more control over their formatters,
which is not really a concern outside, right?
But that is a major design point for the proposal, among other things.
This is how it looks like.
So you have Intel constructor like any other.
Instead of what we do with most Intel constructors where we have the message at the beginning, sorry, we have the locale at the beginning.
Here we have the message and then the locale and the options follow.
And yeah, it works.
This is very simple, but you get the point.
But I hope that this was convincing enough for you to feel that this is something important happening here.
And if it is, then how can you get involved?
Well, one thing I mentioned is the actual message format two syntax and data model and everything that is being standardized under Unicode.
You can go to this repo and read all the issues, see what's been done and give your feedback there.
Help us out in any way you'd like.
And then there is the JavaScript proposal.
It's early stage.
It needs a lot of work.
It needs a lot of feedback and it needs people to be motivated about it, to write tests, to help us figure out the spec, to help us figure out the design details.
And we'd really appreciate your help.
You can find us or me on GitHub and Matrix and start from there.
I'd be more than happy to guide you with this.
And that's all.