From: www.itworld.com
April 10, 2001 —
Previously, we have reported on the ability of several languages to process Unicode correctly. Recall that Unicode is the current standard for encoding most human written languages. This installment of Regular Expressions is a tutorial for English-speaking developers on how to get started with these capabilities.
There's a wealth of available literature on computer representations of human languages. The Resources below point to leading sites that will satisfy even the largest appetites for details on this subject. Our method today is to simplify this abundance down to a few steps that are sure to bring a Unicode newcomer quick successes.
In the United States, computer users conventionally use a keyboard that corresponds closely or exactly to the ASCII encoding chart. The alphabet does not include any accented characters. Even in Western Europe, a region so culturally close to the United States, keyboards are typically localized to facilitate writing with richer character sets. Every language used in Europe other than English requires diacritics for correct spelling.
Most modern desktop computers are delivered with the ability to display at least the languages of Western Europe. The first problem for a US user is just to enter accented characters. Netscape, especially during the heyday of its browser, did a commendable job of documenting this information. If you're sitting at a Windows desktop, for example, Netscape is right to recommend that you use the ALT-integers for character entry. Suppose you want to write:
déjà
With a keyboard from Western Europe, you can probably do this directly. In the United States, though, you'll likely need to press these keys, in succession:
At this point, you should see d
Unix Insider