HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

The emergence of the Internet and other distributed networks puts increasing importance on the ability to create global software - that is, software that can be developed independently of the countries or languages of intended users and then translated for multiple countries or regions. JavaŠ1.1 takes a major step forward in creating global applications and applets. It provides frameworks based on the UnicodeŠ 2.0 character set - the standard for international text - and an architecture for developing global applications, which can present messages, numbers, dates and currency in any country's conventional formats.

However, when all is said and done, it may not be clear just how to go about making your program ready for localization with Java 1.1. This article outlines how to do that, what the strengths and limitations are of the current 1.1 support and what to expect in the future. The discussion also applies to new programs; it is generally much easier to build a global program from the ground up rather than to go back later and retrofit it!

In Part I, we discuss how to convert your application into a global application. In Part II, (JDJ, Vol. 2, Iss. 6) we will look at the current limitations in the JDK 1.1 and what the future may hold in store for us.

The bulk of the international support in Java 1.1 was licensed from Taligent, with data supplied by IBM's Native Language Technical Center (NLTC) in Toronto. Taligent had developed an integrated set of object-oriented frameworks to support the creation of international software, providing a standard API to make handling the requirements of different countries and languages transparent to developers. Using experience developed building C++ frameworks, Taligent redesigned its frameworks in Java, which allows for a much simpler API and implementation. This was a cooperative effort with JavaSoft, which participated in reviewing and adapting the APIs we supplied.

In the text, we will be using the following terms:

  • display string: A string that may be shown to the user. These strings will need to be translated for different countries. Non-display strings, such as URLs, are used programmatically, and are not translated.
  • locale: A name for conventions shared among a large set of users for language, dates, times, numbers, etc. Typically, a single country is represented by a single locale (and loosely speaking, you may use "country" and "locale" interchangeably). However, some countries, such as Switzerland, have more than one official language and, therefore, multiple locales.
  • global application: An application that can be completely translated for use in different locales. All text shown to the user is in the native language, and user expectations are met for dates, times and other locale conventions. Also known as localizable application. To globalize is to convert your program into a localizable application. (Of course, with Java you can also have global applets.)
The JDK 1.1 implements the Unicode 2.0 character set with the Unicode character database version 2.0.14; for brevity, we will refer to this as Unicode 2.0.14. Also, our use of the term application should be understood to include applets, unless otherwise noted.

Converting your Application
First, we will take you through the step-by-step process of converting your application to global. This also will help you make an application that is global from the beginning. For more detailed information about each of the topics, you should definitely consult the Java 1.1 International API documentation. Covering all of the issues involved in developing global applications is beyond the scope of this article, but there are a number of resources available on the Web or in print--see the list at the end of this article.

In this section our examples are additive, with each successive one adding to code that may already have been converted to some extent. In those cases, the Old heading refers to the partially converted code, not your original.

When the user starts up a global application, the default locale will be used for the display text and other user interface elements. If the user changes the default locale (generally with some mechanism on the host system), then you can get a different user interface language. (If you wish, you can go further and support a multilingual application (or applet) which allows use of simultaneous multiple locales. We'll discuss this later.)

Translate Strings
The first step to take in preparing your program is to enable translation of display strings by separating them out from the rest of your code. With Java, this is done via ResourceBundles. These provide a general mechanism that allows you to access strings and other objects according to locale conventions. In principle, they are fairly simple: they just provide a mapping from <key,locale> to <value>. However, they also supply inheritance among locale resources that allows you to minimize duplication across countries and gives you a graceful degradation in case the exact locale does not have localized resources.

Unfortunately, for now you will have to change your code by hand to make your strings translatable. You can do the bulk of the work with a PERL script if you want, but you will still have to check over the results. The general problem with converting strings automatically is the difficulty in distinguishing display strings from non-display strings. Once the commercial Java development environments fully support resource bundles, this task should become easier, since they will make it easy to avoid mixing display strings in the code in the first place.

Resource bundles are very flexible; so flexible that it may be mystifying how to get started. You can use list resource bundles, property resource bundles or make your own; you can have fine-grained bundles or coarse-grained and so on. To give you some direction, we'll start by showing one particular way to do use resource bundles, then discuss some of the other options.

Create a class called MyResources (shown in Listing 1.

Move each display string into that class (see LIsting 1A.)

You have now set up a series of resource pairs of the form {key, value}. The resource keys (such as "CleanCartridge") can be a unique string if you want, even the original string. However, you will be better off using short, clear names: remember that your translators will be seeing these too.

To then create a new French translation for your program:

  • Copy MyResources and rename it to MyResources_fr by appending the proper Java language ID. (You can see a list of the language IDs on the Unicode web site. By convention, language codes are lowercased.)
  • Make it extend its parent, MyResources.
  • Remove the static rb (you only want this on the root class).
  • Translate the resource values into French (but not the keys!).
  • If any of the values are unchanged - the same as the parent - remove the whole {key, value} pair. (You can do this because the {key, value} pairs are inherited from the parent.)
  • Do the same again for any other other language you want to support (see Listing 2).

If you have special strings for a particular country - and not just language - then you do much the same as above.

  • Create a bundle in the same way for that country, such as MyResources_fr_BE for Belgium, by appending the proper Java country ID (You can see a list of the country IDs on the Unicode web site. By convention, country codes are uppercased.)
  • Make it extend its parent; in this case, MyResources_fr.
  • If any of the values are unchanged - the same as the parent - remove the whole {key, value} pair.
The way this is set up, the static rb will be initialized with the right resource bundle according to the default Locale. This is a convenience to allow us to refer to that bundle. You can use locale variables instead to reference the resource bundle, if you want.

If your resource bundle gets too large then you can subdivide it into other resource bundles, such as MyPrintingResources, following the same pattern. You can make this as fine-grained as you want so that you only load the resources for a part of your program when that part gets used. You can return arbitrary objects, not just strings, since it is often easier (and sometimes necessary, such as for graphics that need to be localized), as shown in Listing 3.

You can also use a PropertyResourceBundle instead of a ListResourceBundle. In this case, follow the same pattern, but instead of creating a class, you put the {key, value} pairs into a PropertyFile. The name of the property file is the same as the name of the ListResourceBundle that you would have had. Since you don't have classes any more, put your static rb into someplace convenient, such as in your applet, and use that name for references (e.g., MyApplet.rb.getString("CleanCartridge")). However, if you use a PropertyResourceBundle, be aware that you can only extract strings, not other classes.

Resource bundles have a very simple interface. If you want to use other sources for your strings, you can always subclass to make your own resource bundle. For example, you could write one that accessed strings or serialized objects out of a database, or even over the Web. The basic requirements are to map keys to values, and provide for the inheritance of keys discussed above.

Remove Concatenation
If you haven't done much internationalization, then you may not have heard the mantra: Never concatenate display strings! Why is this a problem? Well, the order of parts of a sentence are different in different languages, and may easily lead you into trouble. For example, if you write MyResources.rb.getString("DeleteBefore") + someDate, the localizer is limited to modifying just the string and not the position of the date. If the language requires verbs to be at the end of the sentence, the localizer is stuck.

You can replace concatenation by using MessageFormats, which allow the localizer to position the variable information appropriately (see Listing 4).

This new pattern string can then be localized, allowing rearranging of the position of the argument {0}. If you want to, you could combine this into a single statement, using the static MessageFormat.format() (see Listing 5).

Message formats can also be used to customize the precise format of dates, times, numbers or currencies. If you just specify the position of the argument, then a default for the current locale will be chosen. However, you (for English) or the localizer (for other languages) can also more precisely control the format if desired. This is done by adding additional keywords or patterns after the argument number, such as in the following examples:

Argument: new Date(97,22,5);

Pattern: "Delete files before {0}"
Result: "Delete files before 6/13/97 1:00 AM"

Pattern: "Delete files before{0,date,long}"
Result: "Delete files before June 13, 1997"
Pattern: "Delete files before {0,date,yyyy.MMM.dd}"
Result: "Delete files before 1997.Jun.13"

Handle Numbers, Currencies, Dates and Times
Number and date formats can also be used separately, with similar control over their formatting. (Number formats handle general numbers and currencies; date formats cover both dates and times.) To globalize your program, replace the implicit conversion of a number to a string with an explicit formatting call, and put the pattern for the format into a resource bundle, as in Listing 6.

If you want to get just a string from the resources and create your own number format, you can do it. However, you must do it in a special way. You should always get a number format using getInstance(), since a particular locale may have a specialized subclass of NumberFormat. However, this subclass may not allow use of a pattern string. You need to check the type of NumberFormat you get before setting the pattern (see Listing 7). (This should be simplified in a future release of Java.)

You can also programmatically alter number formats, such as setting the maximum or minimum number of decimals, or whether a thousands separator is used. However, it is better practice to use a pattern string instead, since otherwise you don't allow your localizers the ability to customize the format.

Of course, if you are formatting in a tight loop, in either case you would be well advised to move the creation of the format out of the loop! You can also make your formats static, to avoid repeated creations.

Instead of using methods on Integer, Float, etc. to do conversion from Strings to numbers, dates, times, etc., use the appropriate formats again for parsing. A Format will parse what it can produce (and more), so you can use the same one for output and input (see Listing 8).

If you are doing your own display of date fields, such as for an alarm clock widget, then you may want to display each of the different component fields (year, month, date, ...) in a separate TextField. Then you will want to use a Calendar, which will convert the standard Date into its components according to local conventions.

Note: The order and choice of these fields may vary according to local conventions. For example, the year may come at the start of the date instead of the end; or the date format may even consist of very different information, such as year + day-in-year. Currently, there is no simple way to get the order of the fields in the format; that should be addressed in a future release. In the meantime, if you intend to use FieldPosition to determine the position of the fields with the text, be warned that there is a bug that makes that difficult: consult Taligent's web site for a workaround.

Calendar has special support for clock widgets. For any given field, it can tell you the result of incrementing or decrementing that field. It also supports a variant form of incrementing/decrementing, called rolling, which gives you the same effect as setting a field on your digital watch, where changing the minute field doesn't affect the hour: ...11:58, 11:59, 11:00, 11:01...

Fix String Comparison
The standard comparison in String will just do a binary comparison. For display strings, this is almost always incorrect! Wherever the ordering or equality of strings is important to the user, such as when presenting an alphabetized list, then use a Collator instead (see Listing 9). Otherwise a German, for example, will find that you don't equate two strings that he thinks are equal!

Of course, if you are comparing strings in a tight loop, you would be well advised to move the creation of the collator out of the loop! You can also make your collator static, to avoid repeated creations.

If a string is going to be compared multiple times, then use a CollationKey instead (see Listing 10). This preprocesses the string to handle all of the international issues, and converts it into an internal form that can be compared with a simple binary comparison. This makes multiple comparisons much faster.

There are also a number of advanced features in Collators, such as the ability to merge in additional rules at runtime or modify the rules. For example, you can make "b" sort after "c", if you really want to; or have "?" sort exactly as if it were spelled out as "question-mark". You can also use collators to do correct native-language searching as well as sorting, using a CollationElementIterator. However, this code is not straightforward, and I would recommend waiting until there are methods in Java to do it for you.

Use Character Properties
If your code assumes that all characters of a given type (such as letters or digits) are the ones in the ASCII range, then it will break with foreign languages. Rather than test for particular ranges of characters, you should use the Unicode character properties wherever possible (see Listing 11).

A number of methods are defined for the more common Unicode character properties. In addition, you have full access to all of the Unicode 2.0.14 character categories, by using Character.getType(). For more information, see the JavaSoft International Specification.

Extend Word-Break Detection
Word breaks in natural language are not just defined by spaces. For example, when I search in this word processor for the word "checked" with the option "Whole Words" checked, I find the last instance of "checked", even though it is not bounded by spaces (there is a comma at the end). Even if you are using more sophisticated tests for ASCII text, such as checking for various punctuation, you must now deal with the wealth of possible characters in Unicode, and how they may behave differently in different countries. By using a BreakIterator, you can avoid dealing with these complexities (see Listing 13).

To find out whether a current index is at a word break, you can use the code in LIsting 14 (this should be in a convenience routine in a future release):

You can use different break iterators to find word boundaries, line-wrap boundaries, sentence boundaries and character boundaries. The latter may seem mysterious: character just means Unicode character, right? However, what native users consider a single character may not be just a single Unicode character, and user expectations may differ from country to country.

Note: In the Java code base - but currently private - is a DecompositionIterator. This actually walks through Unicode text and returns normalized characters. For example, it maps the compatibility characters (such as FULLWIDTH EXCLAMATION MARK) at the end of the Unicode range onto their respective standard characters. Once this is made public, then it can also be used in processing text.

Converting Non-Unicode Text
As long as you are writing a pure Java application using only Unicode characters, you don't have to worry about the thousands of possible character sets out in the world. However, if you are dealing with other data, then you will need to convert in and out of Unicode.

Unfortunately, the API for doing character code conversions is fairly limited at this time, although the hidden implementation is quite extensive. There are two places where this API surfaces. In each of them, you use a string to identify the non-Unicode character set that you are converting to or from. You can attach an encoding to a stream: OutputStreamWriter or InputStreamReader; or on String you specify the encoding when constructing from an array of bytes, or when using the getBytes method to convert to bytes (see Lisitng 15).

Note: Remember that the length of any conversion is not necessarily the same as the length of the source. For example, when converting the SJIS encoding to Unicode, sometimes one byte will convert into a single Unicode character, and sometimes two bytes will.

There is no programmatic way to get a list of the supported character sets, other than to delve into the Sun directory in the Java source. Table 1 is a list of the current supported sets on NT, gotten in just that fashion. Unfortunately, there is no guarantee that these will be present on every platform, nor is there documentation yet of what some of the more obscure names in this list actually refer to!

Table 1

Handling Multilingual Text
If you wish, you can go further, and support a multilingual application (or applet) which allows use of simultaneous multiple locales. First you should understand an important distinction between multilingual data and multilingual user interface:

  • multilingual data: users can enter data or set data formats according to multiple locales (e.g., formats of cells in a spreadsheet).
  • multilingual user interface: users can switch the locale of the display of your application (Menus, Buttons, etc.) at runtime.
You can support both in JDK 1.1, but most people don't find it worth the effort to support a runtime multilingual user interface.

Since Unicode is the character set for Java, the user can enter in multilingual data (with some restrictions, see Limitations of JDK 1.1 in JDJ, Vol. 2, Iss. 6). All of the formats, collators and other international classes allow you to pass an explicit Locale as a parameter. You thus can give the user a choice of locales to use for your display locale or for data. This allows you, for example, to easily have French dates in one column of a table and German dates in another (see Listing 16).

To find out the list of locales available for a particular type of object such as a NumberFormat, look for a static on that object (or its base class) called getAvailableLocales(). To then display the localized names of those locales, such as in a Menu or List, use getDisplayName() (see Listing 17).

Table 2 lists the locales that currently have localized international objects (numbers, dates, etc.) in Java 1.1. If you create locales from the arguments listed, you get the corresponding display names in the adjacent column. This list is supplied only for comparison; you should always use code to find the actual localized objects on your current system. Notice that if you don't supply a specific country (or variant), a default will be chosen.

Table 2

Although for most applications a runtime multilingual user interface is not worth the effort, if you do want to support it, you will restructure your application somewhat. Essentially, you either:

  • separate out the code that builds your UI. When the user picks a different UI locale from a menu, you reset the default locale and then just call your code to rebuild the whole UI.
  • provide code that walks through your UI. When the user picks a different UI locale from a menu, you reset the default locale and call this code to go through each menu and container to replace each individual element with the appropriate new resources.
With the Unicode support already in Java 1.1, the amount of work that you have to do to globalize your application is much smaller than on other platforms. You can start right now to localize your programs, which will get your application a long way towards world coverage: covering Europe, the Americas and (minimally) the Far East. As Java continues to evolve, you soon will be able to localize to all world markets, building on the same code base you have now.

For more information on these topics, see the Taligent Java Demos and the JavaSoft International Specification.

My thanks to Brian Beck, Ken Whistler, Laura Werner, Kathleen Wilson, Baldev Soor, Debbie Coutant, Tom McFarland, Lee Collins, Andy Clark, David Goldsmith and Dave Fritz for their review or assistance with this paper.

Pulling the JDK 1.1 international classes together on a very short schedule demanded a lot of hard work by people at Taligent, including Kathleen Wilson, Helena Shih, Chen-Lieh Huang and John Fitzpatrick. This was assisted by people at the IBM NLTC, most especially Baldev Soor, but also Siraj Berhan and Stan Winitsky. Without the support and excellent feedback from people at JavaSoft it also would not have been possible, especially from Brian Beck, but also from Asmus Freytag, David Bowen, Bill Shannon, Mark Reinhold, Guy Steele, Larry Cable, Graham Hamilton and Naoyuki Ishimura.

For more detailed information about each of the topics, you should definitely consult the Java 1.1 International documentation. http://www.javasoft.com:80/products/jdk/1.1/docs/guide/intl/index.html

To see the Java international classes in action, look at Taligent's Java Demos (JavaSoft has copy of these on their site and in JDK 1.1, although sometimes it may be a somewhat older source). http://www.taligent.com/Products/javaintl/Demos/About.html

To see how to write robust Java classes, consult "Java Cookbook: Well-Mannered Objects." http://www.taligent.com/Technology/WhitePapers/PortingPaper/WellMannered.html

If you are a beginner at Java, but are acquainted with C++ or C, look at Java Cookbook: Porting C++ ...to Java. http://www.taligent.com/Technology/WhitePapers/PortingPaper/index.html.

We also supply C/C++ versions of these classes, in case you are interested in licensing them for other applications besides Java. We also provide on-line updates to this paper and a discussion forum. You can see this and other information at Taligent's home page. http://www.taligent.com

I also strongly recommend buying a copy of The Unicode Standard, Version 2.0 (and I don't even personally get any of the royalties!). For purchasing information and general information about the Unicode Consortium look at the Unicode Web site. http://unicode.org

About the Author
Dr. Mark Davis is the director of the Core Technologies department at Taligent, Inc, a wholly owned subsidiary of IBM. Mark co-founded the Unicode effort and is the president of the Unicode Consortium. He is a principal co-author and editor of the Unicode Standard, Version 1.0 and the new Version 2.0. He specializes in object-oriented programming and in the architecture and implementation of international and text software.


Listing 1: Making an Empty ResourceBundle.

	public class MyResources extends ListResourceBundle {
	// boilerplate
	public static ResourceBundle rb =
	public Object[][] getContents() { return contents; }
	static final Object[][] contents = {
	In ResourceBundle
Listing 1A: Move Strings to Resource Bundles


	myCheckbox = new Checkbox("Clean ink cartridge before printing document");

	In ResourceBundle:


	myCheckbox = new Checkbox(MyResources.rb.getString(

	// insert localized {key, value} pairs here
	{"CleanCartridge", "Clean ink cartridge before printing 

Listing 2: Translating Resource Bundles.


	// insert localized {key, value} pairs here
	{"CleanCartridge", "Clean ink cartridge before printing 


	In ResourceBundle:
	{"CleanCartridge", "Cleanez le cartridge de inque aprs que... "},

Listing 3: Moving Objects to Resource Bundles.


	myCheckbox = new Checkbox("Clean ink cartridge before printing document");

	In ResourceBundle:


	myCheckbox = (Checkbox) MyResources.rb.getObject(

	// insert localized {key, value} pairs here
	{"CleanCartridge", new Checkbox("Clean ink cartridge before printing document")},

Listing 4: MessageFormat instead of Concatenation.


	myCheckbox = new Checkbox(MyResources.rb.getString(
   "DeleteBefore") + someDate);

	In ResourceBundle:
	{"DeleteBefore", "Delete all files before "},


	MessageFormat mf = new MessageFormat(MyResources.rb.getString("DeleteBefore"));
	myCheckbox = new Checkbox(mf.format(new Object[] {someDate}));

	In ResourceBundle:
	{"DeleteBefore", "Delete files before {0}"},

Note: The reason for using the array of objects for the parameter is to allow multiple arguments. 
There will probably be convenience methods in the future to make this a bit smoother.

Listing 5: One-Line MessageFormat.


	myCheckbox = new Checkbox(MessageFormat.format(
	MyResources.rb.getString("DeleteBefore"), new Object[] {some 

Listing 6: Number Output.



In ResourceBundle:


	NumberFormat nf = (NumberFormat)(MyResources.rb.getObject("PageNumberFormat"));

In ResourceBundle:
	{"PageNumberFormat", new DecimalFormat("#,##0")},

Listing 7: Number Output from String Resource.



In ResourceBundle:


	NumberFormat nf = NumberFormat.getInstance();
	if (nf instanceof DecimalFormat) 

In ResourceBundle:
	{"PageNumberFormat", "#,##0"},

Listing 8: Number Input.


	try {
	myNumber = Integer.parseInt(myTextField.getText());
	} catch (NumberFormatException e) {


	try {
	myNumber = nf.parse(myTextField2.getText());
	} catch (ParseException e) {

Listing 9: String Comparison.


	if (string1.equals(string2)) {...
	if (string1.compare(string2) < 0) {...


	Collator col = Collator.getInstance();
	if (col.equals(string1, string2)) {...
	if (col.compare(string1, string2) < 0) {...

Listing 10: Using CollationKey.


	// make up a list of sort keys
	CollationKey[] keys = new CollationKey[sourceStrings.length];
	for (int i = 0; i < sourceStrings.length; ++i) {
	keys[i] = col.getCollationKey(sourceStrings[i]);
	// now sort and stuff them into an AWT List
	List list = new List();
	for (int i = 0; i < sourceStrings.length; ++i) {

Listing 11: Replacing Range Tests.


	for (i = 0; i < string.length(); ++i) {
	char ch = string.charAt(i);
	if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')) {
	// we have a letter, do something with it.


	for (i = 0; i < string.length(); ++i) {
	char ch = string.charAt(i);
	if (Character.isLetter(ch)) {
	// we have a letter (including non ASCII), do something with it.

Listing 12: Replacing Type Tests.


	for (i = 0; i < string.length(); ++i) {
	char ch = string.charAt(i);
	if (ch == '(' || ch == '{' || ch == '[') {
	 // we have an open brace, do something with it.


	for (i = 0; i < string.length(); ++i) {
	char ch = string.charAt(i);
	if (Character.getType(ch) == Character.START_PUNCTUATION) {
	 // we have an open brace (including non ASCII), do something with it.

Listing 13: Going Word-by-Word.

	BreakIterator boundary = BreakIterator.getWordInstance();
	int start = boundary.first();
	for (int end = boundary.next(); 
	 end != BreakIterator.DONE; 
	 start = end, end = boundary.next()) {

Listing 14: Testing Word Breaks.

	if (currentIndex < 0 || currentIndex > stringToExamine.length())
	 return false;
	if (currentIndex == 0 || currentIndex == stringToExamine.length())
	 return true;
	int discard = boundary.following(currentIndex);
	if (boundary.previous() == currentIndex) 
	 return true;
	return false;

Listing 15: Using Foreign Character Sets.

	// convert from ISO 8859-2 into Macintosh Central European
	String string = new String(foreignBytes[],"8859_2");
	otherBytes = string.getBytes("MacCentralEurope");

Listing 16: Multilingual Text Handling.

	NumberFormat nf = NumberFormat.getInstance(Locale.FRANCE);
	// or
	NumberFormat nf = NumberFormat.getInstance(new Locale("fr","",""));

Listing 17: Listing Locales.

	numberLocaleMenu = new Menu("&Locale");
	Locale[] locales = NumberFormat.getAvailableLocales();
  for (int i = 0; i < locales.length; ++i) {


All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.