One Internet, Many Languages: An Introduction To Internationalized Domain Names (IDNs)
To many in the English speaking world, we take our alphabet for granted. The Latin alphabet as used in English is relatively straightforward: 26 letters a through z. Conveniently, these 26 letters, the hyphen “-“ and Arabic numerals 0 through 9 constitute the acceptable characters for a domain name.
To much of the world this is not nearly as intuitive. While the Latin alphabet is the most widely used, 3 other alphabets are used in large portions of the world. The Cyrillic alphabet is spread through Russia, parts of Eastern Europe and former republics of the Soviet Union. The Arabic alphabet spans from Northern Africa through the Middle East and the Brahmic-derived alphabets of Southeast Asia. Throw in the accents, diacritics and ligatures seen in various languages using the Latin alphabet, and the possible combinations become staggering.
So how could this problem be addressed? The simplest solution would be to simply dictate that all domain names would consist of only 26 letters, ten numerals and the hyphen. However, that narrow view limits and complicates the accessibility of the internet to large swaths of the world’s population.
Enter Internationalized Domain Names. Introduced by Martin Duerst in 1996 and implemented in 1998, the system was eventually adopted (with additions and revisions) as the Internationalized Domain Names in Applications (IDNA) system. Within the IDNA system, an internationalized domain name is a name consisting of labels, which can successfully be translated by the ToASCII algorithm.
Internationalizing a domain name works like this. The ToASCII algorithm is applied individually to each label within a domain name. If the ToASCII algorithm fails because any label contains at least one non-ASCII character, then further steps are taken. The name is first "normalized" using the Nameprep algorithm. The normalized name is then converted to ASCII via the Punycode algorithm. Finally, the four character ASCII Compatible Encoding (ACE) prefix "xn- -" is added. If, for any reason, the ToASCII algorithm fails (i.e. the resulting string exceeds 63 characters) the name cannot be internationalized at this time.
To "de-internationalize" a domain name, the ToUNICODE algorithm is applied, resulting in the originally entered domain name, except that any "normalization" will not be undone. The ToUNICODE algorithm will always succeed on a properly internationalized domain name because it is simply "undoing" the work of the ToASCII algorithm.
In theory, the shift into and out of international domain names could occur seamlessly and invisible to the user. This is a useful feature for users but can also expose them to a dangerous spoof. In essence, the idea behind the IDN spoof is to register a domain name visually very similar to a trademarked name, for example Paypal. Due to the visual similarity of the Latin "a" and Cyrillic "a" a domain name consisting of mixed alphabets can be registered and when presented as a link, (like this, http://www.pаypal.com/ where the first "a" is actually a Cyrillic "a") can easily fool users into think they are at the genuine Paypal website. This, of course, would be a great opportunity for phishing scams - or bogus domain auctions.
This was foreseen and guidelines were issued to registries prior to implementing IDNs to address concerns of this spoof. Of course, not all registries fully embraced these guidelines and, as the link above shows, the spoff can be run today. This is now being addressed by browsers. Internet Explorer 7 allows users to only decode selected languages for display in the address bar. Mozilla and Opera have chosen to display the Punycode version of the IDN unless the registry is on a "whitelist" of registries effectively implementing IDN anti-fraud guidelines (such as prohibiting the use of mixed character sets within a name.) Safari displays the Punycode translation of the domain name unless the setting in Preferences is altered to allow display of the decoded name.
So what will the impact of Internationalized Domain Names be on the Internet as a whole? More important to domainers, how does this impact opportunities in domain name investing, and is it already too late to get in on this?
We'll answer these questions, and more, in a future article.
This topic is a HOT one, with many players and a lot of inside information brewing to the top. IDN domains may be the new landrush, beyond what it is already. The DD article is one, if not the best, description defining the IDN domain space and the benefits, problems and issues that confront it. However, the IDN promise looms larger than expected because of the continuing technological growth in the regions where IDN domains would flourish.
The Domain Roundtable Conference 2007 in August has a specific and exclusive Session on this topic scheduled, with some of the top experts in this field invited on the panel, including DD experts. If you're involved already or interested in the unique power of IDN domains, this is an event you do not want to miss.
Sign up for more info — schedule and agenda to be announced the week of 5/25/07. Stay tuned to the DailyDomainer.com for our updates and ads for the DRT2007, both on the DailyDomainer.com website and the DD e-newsletter.
http://www.domainroundtable.com
Stephen Douglas
Executive Producer
Domain Roundtable Conference
NameIntelligence.com
Interested in seeing the agenda for this conference.
regards
Rob
[...] One Internet, Many Languages: An Introduction To Internationalized Domain Names (IDNs): Thinking of going global? First read this informative guide on how other alphabets are integrated into domain names. [...]
Interested in seeing the agenda for this conference.
So great!
regards
Jaycn
I was wondering how it really pertained to investing. I wonder if these domains will be worth placing a decent value on?
It might be a good idea to invest in some of them, but be sure what you buy if you don't know the alphabet!
Is good that there are more domain name options available, but I think that in marketing and reach terms, these kinds of domains are not that good. The whole western world will never have a keyboard or learn these symbols!