One Internet, Many Languages: An Introduction To Internationalized Domain Names (IDNs)
Recent Articles Add comments
To many in the English speaking world, we take our alphabet for granted. The Latin alphabet as used in English is relatively straightforward: 26 letters a through z. Conveniently, these 26 letters, the hyphen “-“ and Arabic numerals 0 through 9 constitute the acceptable characters for a domain name.
To much of the world this is not nearly as intuitive. While the Latin alphabet is the most widely used, 3 other alphabets are used in large portions of the world. The Cyrillic alphabet is spread through Russia, parts of Eastern Europe and former republics of the Soviet Union. The Arabic alphabet spans from Northern Africa through the Middle East and the Brahmic-derived alphabets of Southeast Asia. Throw in the accents, diacritics and ligatures seen in various languages using the Latin alphabet, and the possible combinations become staggering.
So how could this problem be addressed? The simplest solution would be to simply dictate that all domain names would consist of only 26 letters, ten numerals and the hyphen. However, that narrow view limits and complicates the accessibility of the internet to large swaths of the world’s population.
Enter Internationalized Domain Names. Introduced by Martin Duerst in 1996 and implemented in 1998, the system was eventually adopted (with additions and revisions) as the Internationalized Domain Names in Applications (IDNA) system. Within the IDNA system, an internationalized domain name is a name consisting of labels, which can successfully be translated by the ToASCII algorithm.
Internationalizing a domain name works like this. The ToASCII algorithm is applied individually to each label within a domain name. If the ToASCII algorithm fails because any label contains at least one non-ASCII character, then further steps are taken. The name is first "normalized" using the Nameprep algorithm. The normalized name is then converted to ASCII via the Punycode algorithm. Finally, the four character ASCII Compatible Encoding (ACE) prefix "xn- -" is added. If, for any reason, the ToASCII algorithm fails (i.e. the resulting string exceeds 63 characters) the name cannot be internationalized at this time.
To "de-internationalize" a domain name, the ToUNICODE algorithm is applied, resulting in the originally entered domain name, except that any "normalization" will not be undone. The ToUNICODE algorithm will always succeed on a properly internationalized domain name because it is simply "undoing" the work of the ToASCII algorithm.
In theory, the shift into and out of international domain names could occur seamlessly and invisible to the user. This is a useful feature for users but can also expose them to a dangerous spoof. In essence, the idea behind the IDN spoof is to register a domain name visually very similar to a trademarked name, for example Paypal. Due to the visual similarity of the Latin "a" and Cyrillic "a" a domain name consisting of mixed alphabets can be registered and when presented as a link, (like this, http://www.pаypal.com/ where the first "a" is actually a Cyrillic "a") can easily fool users into think they are at the genuine Paypal website. This, of course, would be a great opportunity for phishing scams - or bogus domain auctions.
This was foreseen and guidelines were issued to registries prior to implementing IDNs to address concerns of this spoof. Of course, not all registries fully embraced these guidelines and, as the link above shows, the spoff can be run today. This is now being addressed by browsers. Internet Explorer 7 allows users to only decode selected languages for display in the address bar. Mozilla and Opera have chosen to display the Punycode version of the IDN unless the registry is on a "whitelist" of registries effectively implementing IDN anti-fraud guidelines (such as prohibiting the use of mixed character sets within a name.) Safari displays the Punycode translation of the domain name unless the setting in Preferences is altered to allow display of the decoded name.
So what will the impact of Internationalized Domain Names be on the Internet as a whole? More important to domainers, how does this impact opportunities in domain name investing, and is it already too late to get in on this?
We'll answer these questions, and more, in a future article.
Read the rest of the story...Leave a Reply
You must be logged in to post a comment.


