HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

One of the outstanding things about Java is that it facilitates building internationalized programs, programs that work in various languages and regions. Java has used Unicode as its internal character set since version 1.0. In Java 1.1 another important feature was introduced - an internationalization API.

Although the Java Internationalization API generally provides well-designed, object-oriented ways to internationalize an application, it may be a misnomer to call it an API because it's not a single class or package. Rather it's a set of loosely related classes scattered throughout several different packages. The many features and the relationships between them can be overwhelming, especially when you're trying to do something relatively straightforward and standard.

This complexity can be tamed by using the Façade pattern. This pattern suggests that you create a simple interface for a complex set of functionality by creating one object that mediates access to another set of objects that are related in some complex way. This is a convenient way to implement the Java Internation- alization API.

Figure 1
Figure  1: II8nFacade Class overview

Internationalization and Localization
There are two aspects to producing globally enabled software. The first is internationalization (sometimes abbreviated i18n), which means producing software that doesn't have built-in assumptions about language and cultural conventions. This should first be considered in the design phase of a project.

The second is localization (l10n), which means adapting software to work in a specific language. This is usually done in parallel with software development for at least one language and later for additional languages.

Properly internationalized software can be localized easily, without reengineering, recoding, and recompiling, by someone who specializes in translation, not programming. Each localized version must still be tested, of course, but the scope of testing can be significantly more limited than a complete test of the entire application.

Locales
An internationalized Java application uses the concept of locale to select the language and set of cultural conventions to use at runtime. A locale typically consists of two parts: a language designation and a country designation. Java uses the two-letter codes from ISO-639 to designate language and the two-letter codes from ISO-3166 to designate country. Java optionally allows a third part, a variant, that allows support for dialects within a country, for example. Java notably uses a variant to provide support for the Euro.

The Façade Pattern The classes that we'll be using for Java internationalization generally have three things in common:

1. They have an abstract class with one or more factory methods to obtain concrete classes. The concrete classes are usually cast up to the abstract method, and we're unaware of what the actual concrete class is.
2. They usually depend on a locale for their behavior, normally specified when the factory method is called to obtain the concrete class.
3. They provide one or more methods with various options that provide the functionality we need, such as formatting a date according to the current locale's conventions or comparing two strings.

As suggested by the Façade pattern, we create a single object, the I18nFaçade class (see Listing 1), and use it to organize and manage the various internationalization classes. (Listings 1 and 2 can be found below). We set the locale once when the Façade class is instantiated, then instantiate each of the various internationalization API classes and delegate to them as needed.

Various constructors are provided in the I18nFaçade class to allow users to specify the two options we don't allow them to change later, locale and ResourceBundle.

Users can supply a locale directly, or country and language codes from which we create a locale, or nothing, in which case we obtain the default locale from the system. Here's an example using language and country codes:

// Use US English, default ResourceBundle
I18nFacade intl = new I18nFacade("en","US");

The Façade allows the use of only a single ResourceBundle - a repository of localized strings (contained in a set of related files, as we'll see later). A default ResourceBundle is specified in the Façade, which presumably is the one used for general user interface text. It's good practice though to group messages into different ResourceBundles according to type, so we've also made it possible to specify the ResourceBundle when the Façade class is instantiated. Here's an example:

// Use Mexican Spanish and AppErrorMessages
// ResourceBundle
I18nFacade intlSpErrors = new I18nFacade(
"es", "MX", "AppErrorMessages);

Now we look at the ways we manage and delegate to the internationalization API classes. We briefly note that we often know which class we need to call based on the type of the parameter or parameters we're passed. We can create overloaded methods, especially in the case of formatting classes, which can help reduce complexity. The Façade pattern is not intended to limit access to the underlying component classes. We've provided get() methods that return the underlying classes so that anything that can't be done through our simple methods, such as setting custom formatting patterns, can be done by accessing the classes directly. Listing 2 provides code that exercises the functionality in the I18nFaçade.

Text Translation
Perhaps the most obvious issue for applications that are intended to work globally is that we need to be able to translate the user-visible text. In the past people often wrote applications with print statements containing hard-coded text. Translating an application like that means going through all the source code, identifying user interface text, changing, recompiling it, and hoping we didn't break something else in the process. Separating user text by putting it in resource files maintained separately from the source code is a good way to solve this problem. (We can do this for graphics and sound files too, by using a convention involving directory paths and filenames, and putting the path and filename in the resource file.) To add support for another language we add another resource file.

The Java API provides an abstract class, ResourceBundle, with two important subclasses, PropertyResourceBundle and ListResourceBundle, to support resource files. These classes allow us to use a tag as a key to looking up an object in a set of files. Which specific file it uses depends on the current locale.

The most convenient of these to use is the PropertyResourceBundle, which simply uses text files, .properties files, that don't need to be compiled. The other options require us to create classes that contain the key and value pairs and compile them.

Let's look at an example. Suppose we have a program "Hello.java". Instead of using a string such as "Hello!" directly in our print statement, we look up the localized string using a key, such as "HI":

// Get a localized String using a Tag
String hello = intl.format("HI");
System.out.println(hello);

In ResourceBundle, we have specified a base filename for the ResourceBundle. This means we must have a default properties file HelloMessages.properties:

# AppMessages.properties
#
# Default - English

HI = Hello!
BYE = Goodbye!
HI_NAME = Hello, {0}

This is the default version because the filename consists of a basename "HelloMessages" plus ".properties" and no locale information.

A French version might be called "HelloMessages_fr.properties", where "fr" specifies the language and might contain:

# AppMessages_fr.properties
#
# French

HI = Bon jour!
BYE = Au revoir!
HI_NAME = Bon jour, {0}!

There may also be a file called "HelloMessages_fr_FR", which would be specifically French French. Finally, there may also be a file "HelloMessages_fr_FR_PARIS", which would be Parisian French.

When we search for the value according to the key we provided, the ResourceBundle will first try to find it according to the locale we specified when loading the ResourceBundle using the basename, language code, country code, and variant, each separated by underscores followed by ".properties". If this fails, it ignores the country code and looks for basename and language code separated by underscores and followed by ".properties". If this fails, it will look for just the basename followed by ".properties". This provides a nice system of defaults.

Message Formatting
Another issue to deal with is that programs often construct strings dynamically, such as by concatenation or substitution. Generally speaking, this should be avoided. Different languages put words and phrases together in ways that are hard to anticipate. For these parts to be correctly translated in isolation, each part, including its context, must be carefully and consistently documented by the programmer.

Sometimes it's unavoidable. We need to construct sentences with information that can only be known at runtime, such as when we inform users of their last login time or remaining balance. If other user-interface solutions won't do, the Java Internationalization API provides a class for message formatting, the MessageFormat class.

The MessageFormat class is one of five formatting classes, all subclasses of the abstract Format class, that we'll include in our Façade class. It shares many characteristics with the rest of its family. The MessageFormat class is an abstract class but has a static factory method, getInstance(), that returns an appropriate concrete class cast to MessageFormat.

The method we're interested in, format(), takes a pattern string, followed by an array of objects. It parses the pattern string, identifying placeholders within the pattern, which it replaces with other strings. The simplest nontrivial pattern is text with numbers in braces, such as:

"You have a message from {0} waiting."

where the first object in the array of objects would be a string to be inserted in place of {0}.

Additional options are available that specify that the object in the array is to be formatted as a date or time, for example:

"A message from {0} arrived at {1,date,short}."

When deciding whether to use these features, consider how it will affect the task of translation; keeping things simple reduces cost and errors when translating.

Number Formatting
The format for numbers, date, time, and currency vary greatly from place to place. Numbers in U.S. English, for example, use the period as the decimal separator and a comma as the thousands separator, whereas some locales use the reverse, as shown in Table 1.

Table 1

(Note: The French number used punctuation, replaced here with an underscore, that could not be displayed correctly when I ran the code in a DOS window on Windows 98.)

These results were obtained by using code similar to the following for English:

I18nFacade intl = new I18nFacade("en","US");
System.out.println(intl.format(12345.678));

Currency
Currency is a special case of formatting numbers, but there are two additional issues regarding currency. The first is more or less obvious: formatting currency is not the same as converting currency. Displaying yen as dollars is a serious error. If the application is intended to work only with one currency, even if it works with multiple languages, this is not a problem, except that a separate locale needs to be used for currency.

By default the currency locale is the same locale specified when the class was instantiated. The Façade class has setCurrencyLocale() methods to allow specifying a separate currency locale.

The second issue regarding currency is that we need to carefully consider the numeric type we use to represent it. When we're talking about most quantities, a little rounding here and there is not a problem, and float and double are fine. This isn't usually the case with currency; a penny here and a penny there often matter. This suggests that we might use an integer type, such as long, with a decimal offset. In some countries the currency for large transactions can require support for large denominations that exceed any of the native Java types. In that case it may be desirable to use a class like BigDecimal to represent numbers, perhaps wrapping in a Currency class.

For the purpose of simplifying our example, we assume that the numbers being formatted are known to represent the correct currency, that their magnitude will not exceed the capacity of the native types, and that rounding errors will be insignificant. Table 2 shows some of the currency formats.

Table 2

(Note: The French number used punctuation, replaced here with an underscore, that could not be displayed correctly when I ran the code in a DOS window on Windows 98.)

These results were obtained by using code similar to the following for English:

I18nFacade intl = new I18nFacade("en","US");
System.out.println(intl.currencyFormat(12345.678));

Date Formatting
Date and time formats vary greatly. For dates, the order and punctuation between day, month, and year differ. In addition, verbose formats include the names of the days and months (or abbreviations) that vary according to the language. Table 3 shows short and medium formats for dates for a few countries.

Table 3

(Note: The French name for August includes a character, displayed here as an underscore, that couldn't be displayed correctly when I ran the code in a DOS window on Windows 98.) These results were obtained by using code similar to the following for English:

I18nFacade intl = new I18nFacade("en","US");
Calendar cal = Calendar.getInstance();
Date date = cal.getTime();
System.out.println(intl.dateFormat(date));

Another issue that the standard Java Internationalization API doesn't resolve is that different calendars are used in some countries or for certain purposes. Most of us are familiar with the Gregorian calendar and this is the only one for which a class, GregorianCalendar, is provided in the Java API. (This is what the Calendar.getInstance() factory method in the example above returns.)

You shouldn't use the methods in java.util.Date to construct date Strings. These have been deprecated as of Java 1.1 because they only allow you to use a Gregorian calendar.

IBM has open-source Java classes in its ICU4J project that support other calendars, such as BuddhistCalendar, HebrewCalendar, and JapaneseCalendar. See http://oss.software.ibm.com/icu4j/doc/index.html for more information about these classes and the calendars they support.

Time Formatting
Table 4 shows the short and long time formats for the same countries as above:

Table 4

The code is similar to what we used to obtain dates in the preceding example, except we use the timeFormat() method:

I18nFacade intl = new I18nFacade("en","US");
Calendar cal = Calendar.getInstance();
Date date = cal.getTime();
System.out.println(intl.timeFormat(date));

Parsing Numbers, Dates, and Time
The same issues we considered for formatting apply to parsing user input. In addition, processing user input requires that the format we expect is clear to the user. Finally, we must validate and properly handle bad input. Specifically, the application must be able to catch and appropriately handle ParseException.

It's also the responsibility of the application to properly interpret the returned value. The dateParse() and timeParse() methods both return a Date, but in the first case the time portion is zero, and in the second the date is zero. Here's a sample call to parse a date:

Date userDate;
try
{
userDate = intl.parseDate("12/8/01");
}
catch (ParseException e)
{
// Handle bad input
}
The parseNumber() method returns a Number object and it's up to the application to convert it to the proper primitive using the intValue(), longValue(), floatValue(), and doubleValue() methods, for example:

Number userNumber = null;
// Called parse ...
long userLong = userNumber.longValue();

Sorting, Searching, and String Comparisons
In English, a once common algorithm for comparing strings - the essential operation in performing a sort - is pretty easy: convert the strings to uppercase (assuming we want a case-insensitive sort), then compare them character code by character code. This doesn't work as a general algorithm, however, because many languages use characters outside of A-Z and a-z, and some languages, such as Chinese, don't even have the concept of upper- and lowercase. Other complications include expansion, where a language sorts certain letters as though they were two, such as the German eszett (\u00DF), which is treated as two ss for sorting purposes, and contractions where two letters such as "ch" in Czech are treated as a single letter that sorts between h and i.

Besides optionally taking case into account, we also have the option of taking diacritics into account, depending on the situation. For sorting, we may want the letter "a" to sort before "á", but for searching we may wish to consider them equal.

For these reasons (among others too numerous to list here), we shouldn't use the String.-equals() and String.compareTo() to compare two strings. The Java Internationalization API has a class Collator that we should use instead. This class, like the format classes, is an abstract class with a factory method that returns a concrete class, RuleBasedCollator, cast up to Collator. Collator has a method compare() that will compare two strings using the default rules for the locale. Collator also has a property, Strength, that allows us to select between locale-specific options such as case and accent sensitivity.

Our Façade has a method compare() that will obtain a collator for us (if it doesn't already have one) and return the results of comparing two strings:

I18nFacade intl = new I18nFacade("en","US");
if(intl.compare("Apples", "Oranges")
{
// do something based on this result
}
The concrete class is a rule-based collator. Using it is expensive since it must parse the string according to many rules and expand or contract characters based on diacritics and other language rules. If more than a few strings are going to be compared more than a few times, such as when sorting a large set of data, it may be desirable to capture the result of this parsing in advance of sorting by calling the getCollationKey() method, perhaps as data is entered. The CollationKeys can be saved and later used to perform the comparisons instead of the strings themselves.

Additional Considerations
The intention of the Façade was to make it easier to use the Java Internationalization API. Some features are not included because it wouldn't really make those features easier to use and would only make the Façade more complicated. Other features are not included because they require a different solution or approach.

Text Boundaries
The BreakIterator class provides methods for locating character, word, and sentence boundaries. While it's relatively easy to detect these boundaries in English, many languages present difficulties. Chinese and Thai, for example, have no spaces between words. Spanish can have punctuation at the beginning of a sentence. Thai has no punctuation whatsoever. It's important to be aware of these issues, but locating text boundaries isn't something that's usually useful.

While it's indispensable for doing such things as word wrapping, dictionary lookup, and indexing, using a Façade isn't going to make the job significantly easier - it's probably best to use the BreakIterator class directly in those few specific cases where it's required.

Display, Input, and Output
I've also omitted support in the Façade for graphical display, and input and output functionality (including character set conversions), because these are entirely unlike the other functionality in the Façade. There are other design patterns better suited for this type of problem.

Conclusion
Design patterns help software developers by allowing them to consider what the problems have in common. The insight gained from solving one can be carried over when solving the next one. The problem with using the Java Internationalization API - many classes, loosely related, with many features - describes the type of problem the Façade pattern is intended to solve. Using this pattern does indeed make the Java Internationalization API easier to use.

Reference
Grand, M. (1998). Patterns in Java: A Catalog of Reusable Design Patterns. Wiley. pp. 205-211.

Author Bio
David Gallardo is an independent software consultant specializing in internationalization, Java and database development. Previously he led database and internationalization development at a B2B e-commerce company. [email protected]

Download Source Files (~ 11.6 KB ~Zip File Format)

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.