So, I'm Arnt, I work at ICANN with email and DNS.
I wanted to talk about, I wanted to interrupt about 20 times in the past talk because I
have been working with IMAP, email, DNS for many, many years.
I wrote about half of the RFCs that Gmail supports as IMAP extensions.
Alexei here wrote the other ones.
My work at IMAP involves fixing bugs in any open source libraries that deal with email
or the DNS and we're hiring someone so please will somebody come talk to me.
I also need to talk to people, explain to them how to set up DoveCut for example and
explain to people about Unicode email which is the main focus.
I am Norwegian, I live in Germany and today is the first time that I've actually been
in the city where my office is located.
No, yesterday was the first day.
Tomorrow I'm going into the office, I'm going to get a badge.
I should, well, ICANN rules say I have to explain what ICANN does.
It doesn't really matter, the important thing is we do boring things to do with a domain
name system.
Somebody has to do the backups for the main DNS, we do that.
Somebody has to look after uptime.
I have colleagues who grow seriously stressed when some domain has been down for two minutes.
In the summer the domain administrator in Lebanon died in some accident.
Things happened really quickly to get the DNS back up.
My baby is email address internationalization, a set of RFCs that were written 15 years ago,
10 years ago in two rounds.
We had a testbed in China where we tried one architecture, then we simplified it, made
it better and that's what we have now.
It's the first email change that simplifies something.
We have Unicode everywhere and that's why we have at least some support.
This is a valid message.
You may see there's no RFC 2047 encoding, there's no quoted printable, there's pretty
much nothing.
How many of you can see the syntax errors in that?
Both?
No, it's right.
The syntax errors according to current RFCs are that there is no date field and there
is no from field.
Apart from that, all of them don't need, is optional.
Message ID is optional.
It is reduced.
Technically it's optional.
This stuff, that's a real life message with a couple of extra header fields, pretty much
like the one I showed.
It was actually written by a Ford K9 mail in the Indian company that forks that and sells
it to the government.
To Gmail, I'm supposed to blank out all the personal information but you can see that
this is Devanagari, an Indian writing system that the main works in the sense that it works
with Microsoft and Google and the rest of us, well, we don't match so much, right?
The changes necessary to make this domain work in SMTP are fairly simple.
The server has to say, yes, I support SMTP UTF-8 and then the sender says SMTP UTF-8 at
the end there.
If you do that with an unexcended server, you will provoke a syntax error and the mail
will be returned as an error.
This is a feature.
It simplifies debugging.
We tried it the other way.
That was bad.
This simplifies debugging and simplifying debugging is great.
That domain existed for a while but it was removed.
It's a test domain.
I think I'm not going to go into the meaning of that.
Once you have declared that you want to use SMTP UTF-8, all this is legal, including that
domain.
If you do not declare that support, then that domain is illegal, will provoke a syntax error.
However most servers do declare support for it these days.
I'm happy it's much the same.
The client says enable UTF-8 equals accept, meaning the client will accept UTF-8.
The server says yes, I've enabled that.
After that, the client can just use UTF-8 in ordinary quoted strings, which has the
side benefit of eliminating most of those literals that so plague people.
If you have ever had a male client that couldn't handle a password containing non-ASCII character,
that was probably due to lack of literal support for passwords.
We eliminate that now.
I don't know if it's not the only way that people read mail today.
There are five main protocols.
Three of them just support all of this.
If you use Exchange or Office 365, your app will probably speak a protocol called EWS,
Microsoft specific thing, find client library on GitHub, and so on.
Everyone just support it in the core.
I'm happy sort of the laggard there.
Pop has a defined extensions, but I don't know anyone who has written any code for that.
Have anyone here used pop in the past five years?
Thanks.
The nice things about this architecture we have just use Unicode everywhere is that code
like this works.
People recognize strings by saying, well, there's ASCII 34, then they just go on to find the next ASCII 34,
and that works without change.
People today use Unicode for all the strings in their program.
Okay, here comes another Unicode string.
It works.
This is why when I patched the Ruby standard library, the change to support enable was
as big as the change to support actual Unicode email.
Enable needed an actual new command.
The Unicode stuff pretty much only needed testing.
The biggest program that I have patched was PostFix, which needed well over a thousand
lines of code.
The smallest one was PropMail with zero lines.
Written in 1991 needed no changes to me.
This says that it's a good extension.
Most of the main, most of the modern languages have support for this in the standard library
already.
Rust should have, right?
Yours should become the standard library.
If you want to support it, I have a set of EAI test messages.
That's currently seven messages.
If you manage to render those seven, then you have support for it.
Unfortunately, one of them will not be as simple as you wish, but that's life.
If you want to see whether something does support it, I have an auto responder.
Auto is the Norwegian word for gray.
Also it's pronounced like, or except the other way.
It demonstrates a really, very, very nice bug, which I'm not going to explain.
It becomes a very rude word if you push it through a certain bug.
Most of the service support it, at least six here.
Postfix and Exim are the two biggest ones open source now.
Something like the next four also support it.
Halon and Momentum are the service that send you mail like your package has shipped.
Mavis Clownout is pretty much support on the service side.
Clownside is not so good.
For my work, I run into a lot of bugs, speak to a lot of people who implement this.
And this is the common bugs, things people tend to run into.
The worst of them is the third Gmail uses all UTF-8.
This is nice.
It works smoothly with a lot of code.
Exchange sends this awful things with Xn that you see there, which I hate.
You can see why they do it, much does the same.
It's something that needs to be handled.
To my mind, it's a bug.
There's a lot of code that looks at strings like in the header fields and say,
oh, do I need to do RFC 2047 encoding here?
That can be modified.
You need extra tests to check things.
In December, I went to China to do interrupt testing with various software companies
providers there.
Two of them had bugs where they accidentally encoded the local part, which is not legal.
Absolutely not legal, will not work.
It's complicated, but the short version is it won't work.
It can't be made to work.
You're the first audience that hasn't asked about downgrading either by rule or by interrupting.
People always ask, what if people with a Chinese address want to send e-mail to someone with an ASCII address?
It's a fascinating corner case for people like us.
It doesn't really matter to the people who want Chinese e-mail addresses because they only write Chinese.
If you only write Chinese, you're not going to want to send e-mail to somebody in India.
So in real life, it doesn't really matter, and supporting it, which we did in the testbeds,
made debugging really complicated.
It was difficult to find out where a bug was.
Better to kill the feature, have simplicity, have something that's simple to implement like those
100 lines in Ruby.
And these are the RFCs that need implementation.
I'll be happy to talk to anyone about either using libraries, whether Java, Mail, Netimap, Netimap.
There are several things called that.
Talking about this on the wire.
I'll also talk about any older RFC.
And I'll talk to you about the job that I have.
One more person to do my kind of work.
Yes, Alex wants a new job.
That's the official slide that I have to have at the end.
I can policy.
Thank you.
I've not seen who has been first of you.
Very short question about the UTF-8.
How is the current proposed way to do comparison in server-side software, for example,
usernames and password and addresses?
Right.
The question was, how do you do comparison of usernames, passwords, addresses?
There is a set of RFCs for that called Precise, P-R-E-C-I-S,
which I understand is only modestly supported.
At present, I think that pausing on bytes unchanged is best if you can do that.
If you do that, some Arab passwords won't work, but in most of the world, it works.
On a somewhat longer term, we need a shareable open-source implementation of those RFCs.
It will be a problem for more people than part of the Arab world.
Still, some implementations have done this for years and not suffered.
They happen to be outside the Arab world.
Next, Eniets.
So, if people have addresses with non-ESC local parts, do they also often have in practice
a backup address in ESC only so that they can use that if they get the failure to deliver?
It depends. In China, people have both.
In India, the people who have Indic email addresses have only that.
So, that's culturally dependent.
I think it has a lot to do with what kind of input methods people use on the keyboard.
In front.
For IMAP, as UTF-8 equals accept, been completely folded into IMAP for revision 2.
Are there some subtle differences?
Yes.
I think there are subtle differences.
There is one subtle difference, and you should do what IMAP for revision 2 does.
Alexei got it right. His RFC is right. The other one is bad.
So, I should ignore what's said in the old one.
Even if in the revision 1, if I implement revision 1 only, I should do the thing that revision 2 says.
You should do what revision 2 says.
Okay, I have three questions. Maybe take it quick.
Okay, quick. Andrei here.
I like to use my accent character, which is just Latin extended character.
I see that this email address internationalization is mostly for different scripts,
but with this extended Latin, I have very often the program that my email address is invalid for many websites
because there's this accent character.
And it's just the, I'm just the easy part. I'm just doing the IDN part.
The local part is almost never working.
The IDN part behind the ad sign is working better, but still I have to put it there and code it.
So, wouldn't it for maybe extended Latin actually be better for sake of compatibility
to have some standard way of let's say, remove all the critics and make it ASCII
and so it would be backwards compatible?
Quick answer, Art.
Suppose you do that using the standard library that exists.
What do you think Gros becomes?
Okay, it's not in the dictionary.
Which, when you change somebody's name to something not in the dictionary,
you have a user experience problem.
The larger answer is that people have suggested and tried various kinds of downgrading.
They are overall, each of them, more work than just adding support for unicodes.
Nice intention in practice.
Really, when you compare it to writing 100 lines of codes, almost everything is bad.
Okay, so I think you're trying to show that
the negotiation happens between one point and another point.
It's not end to end, right?
So if any intermediate doesn't support this extension,
how will it affect the actual message being composed?
The second question is, does this have any impact on DKIM?
Okay, first question.
Yes, the entire chain of mail service has to support it,
which is the reason that I was happy that Amarvis, PostFix, Xim, blah, blah,
supported it so early.
The chain is, in practice, not a big problem today.
The user agent is a big problem today.
And particularly, user agents like contact forms, often things.
Second, DKIM could be a problem.
I still haven't seen any bug related to that,
and believe me, I see all the bugs.
Bless the DKIM people, because they must have done something right.
All right, thank you very much, Arndt.
I think we have all the questions.
It says more questions, says to the chat,
and I think Arndt is also happy to join.
And we switch over to the next session, so we...
Thank you.