One of the outstanding things about Java is that it
facilitates building internationalized programs, programs that work
in various languages and regions. Java has used Unicode as its
internal character set since version 1.0. In Java 1.1 another
important feature was introduced - an internationalization API.
Although the Java Internationalization
API generally provides well-designed, object-oriented ways
to internationalize an application, it may be a misnomer to call it
an API because it's not a single class or package. Rather it's a set
of loosely related classes scattered throughout several different
packages. The many features and the relationships between them can be
overwhelming, especially when you're trying to do something
relatively straightforward and standard.
This complexity can be tamed by using the Façade pattern.
This pattern suggests that you create a simple interface for a
complex set of functionality by creating one object that mediates
access to another set of objects that are related in some complex
way. This is a convenient way to implement the Java Internation-
alization API.
Figure 1: II8nFacade Class overview
Internationalization and Localization
There are two aspects to producing globally enabled software.
The first is internationalization (sometimes abbreviated i18n), which
means producing software that doesn't have built-in assumptions about
language and cultural conventions. This should first be considered in
the design phase of a project.
The second is localization (l10n), which means adapting
software to work in a specific language. This is usually done in
parallel with software development for at least one language and
later for additional languages.
Properly internationalized software can be localized easily,
without reengineering, recoding, and recompiling, by someone who
specializes in translation, not programming. Each localized version
must still be tested, of course, but the scope of testing can be
significantly more limited than a complete test of the entire
application.
Locales
An internationalized Java application uses the concept of
locale to select the language and set of cultural conventions to use
at runtime. A locale typically consists of two parts: a language
designation and a country designation. Java uses the two-letter codes
from ISO-639 to designate language and the two-letter codes from
ISO-3166 to designate country. Java optionally allows a third part, a
variant, that allows support for dialects within a country, for
example. Java notably uses a variant to provide support for the Euro.
The Façade Pattern
The classes that we'll be using for Java internationalization
generally have three things in common:
1. They have an abstract class with one or more factory methods
to obtain concrete classes. The concrete classes are usually cast up
to the abstract method, and we're unaware of what the actual concrete
class is.
2. They usually depend on a locale for their behavior, normally
specified when the factory method is called to obtain the concrete
class.
3. They provide one or more methods with various options that
provide the functionality we need, such as formatting a date
according to the current locale's conventions or comparing two
strings.
As suggested by the Façade pattern, we create a single
object, the I18nFaçade class (see Listing 1), and use it to organize
and manage the various internationalization classes. (Listings 1 and
2 can be found below). We set the locale once when the
Façade class is instantiated, then instantiate each of the various
internationalization API classes and delegate to them as needed.
Various constructors are provided in the I18nFaçade class to
allow users to specify the two options we don't allow them to change
later, locale and ResourceBundle.
Users can supply a locale directly, or country and language
codes from which we create a locale, or nothing, in which case we
obtain the default locale from the system. Here's an example using
language and country codes:
// Use US English, default ResourceBundle
I18nFacade intl = new I18nFacade("en","US");
The Façade allows the use of only a single ResourceBundle - a
repository of localized strings (contained in a set of related files,
as we'll see later). A default ResourceBundle is specified in the
Façade, which presumably is the one used for general user interface
text. It's good practice though to group messages into different
ResourceBundles according to type, so we've also made it possible to
specify the ResourceBundle when the Façade class is instantiated.
Here's an example:
// Use Mexican Spanish and AppErrorMessages
// ResourceBundle
I18nFacade intlSpErrors = new I18nFacade(
"es", "MX", "AppErrorMessages);
Now we look at the ways we manage and delegate to the
internationalization API classes. We briefly note that we often know
which class we need to call based on the type of the parameter or
parameters we're passed. We can create overloaded methods, especially
in the case of formatting classes, which can help reduce complexity.
The Façade pattern is not intended to limit access to the
underlying component classes. We've provided get() methods that
return the underlying classes so that anything that can't be done
through our simple methods, such as setting custom formatting
patterns, can be done by accessing the classes directly. Listing 2
provides code that exercises the functionality in the I18nFaçade.
Text Translation
Perhaps the most obvious issue for applications that are
intended to work globally is that we need to be able to translate the
user-visible text. In the past people often wrote applications with
print statements containing hard-coded text. Translating an
application like that means going through all the source code,
identifying user interface text, changing, recompiling it, and hoping
we didn't break something else in the process. Separating user text
by putting it in resource files maintained separately from the source
code is a good way to solve this problem. (We can do this for
graphics and sound files too, by using a convention involving
directory paths and filenames, and putting the path and filename in
the resource file.) To add support for another language we add
another resource file.
The Java API provides an abstract class, ResourceBundle, with
two important subclasses, PropertyResourceBundle
and ListResourceBundle, to support resource files. These
classes allow us to use a tag as a key to looking up an object in a
set of files. Which specific file it uses depends on the current
locale.
The most convenient of these to use is the
PropertyResourceBundle, which simply uses text files, .properties
files, that don't need to be compiled. The other options require us
to create classes that contain the key and value pairs and compile
them.
Let's look at an example. Suppose we have a program
"Hello.java". Instead of using a string such as "Hello!" directly in
our print statement, we look up the localized string using a key,
such as "HI":
// Get a localized String using a Tag
String hello = intl.format("HI");
System.out.println(hello);
In ResourceBundle, we have specified a base filename for the
ResourceBundle. This means we must have a default properties file
HelloMessages.properties:
# AppMessages.properties
#
# Default - English
HI = Hello!
BYE = Goodbye!
HI_NAME = Hello, {0}
This is the default version because the filename consists of
a basename "HelloMessages" plus ".properties" and no locale
information.
A French version might be called
"HelloMessages_fr.properties", where "fr" specifies the language and
might contain:
# AppMessages_fr.properties
#
# French
HI = Bon jour!
BYE = Au revoir!
HI_NAME = Bon jour, {0}!
There may also be a file called "HelloMessages_fr_FR", which
would be specifically French French. Finally, there may also be a
file "HelloMessages_fr_FR_PARIS",
which would be Parisian French.
When we search for the value according to the key we
provided, the ResourceBundle will first try to find it according to
the locale we specified when loading the ResourceBundle using the
basename, language code, country code, and variant, each separated by
underscores followed by ".properties". If this fails, it ignores the
country code and looks for basename and language code separated by
underscores and followed by ".properties". If this fails, it will
look for just the basename followed by ".properties". This provides a
nice system of defaults.
Message Formatting
Another issue to deal with is that programs often construct
strings dynamically, such as by concatenation or substitution.
Generally speaking, this should be avoided. Different languages put
words and phrases together in ways that are hard to anticipate. For
these parts to be correctly translated in isolation, each part,
including its context, must be carefully and consistently documented
by the programmer.
Sometimes it's unavoidable. We need to construct sentences
with information that can only be known at runtime, such as when we
inform users of their last login time or remaining balance. If other
user-interface solutions won't do, the Java Internationalization API
provides a class for message formatting, the MessageFormat class.
The MessageFormat class is one of five formatting classes,
all subclasses of the abstract Format class, that we'll include in
our Façade class. It shares many characteristics with the rest of its
family. The MessageFormat class is an abstract class but has a static
factory method, getInstance(), that returns an appropriate concrete
class cast to MessageFormat.
The method we're interested in, format(), takes a pattern
string, followed by an array of objects. It parses the pattern
string, identifying placeholders within the pattern, which it
replaces with other strings. The simplest nontrivial pattern is text
with numbers in braces, such as:
"You have a message from {0} waiting."
where the first object in the array of objects would be a string to
be inserted in place of {0}.
Additional options are available that specify that the object
in the array is to be formatted as a date or time, for example:
"A message from {0} arrived at {1,date,short}."
When deciding whether to use these features, consider how it
will affect the task of translation; keeping things simple reduces
cost and errors when translating.
Number Formatting
The format for numbers, date, time, and currency vary greatly
from place to place. Numbers in U.S. English, for example, use the
period as the decimal separator and a comma as the thousands
separator, whereas some locales use the reverse, as shown in Table 1.
(Note: The French number used punctuation, replaced here with
an underscore, that could not be displayed correctly when I ran the
code in a DOS window on Windows 98.)
These results were obtained by using code similar to the
following for English:
I18nFacade intl = new I18nFacade("en","US");
System.out.println(intl.format(12345.678));
Currency
Currency is a special case of formatting numbers, but there
are two additional issues regarding currency. The first is more or
less obvious: formatting currency is not the same as converting
currency. Displaying yen as dollars is a serious error. If the
application is intended to work only with one currency, even if it
works with multiple languages, this is not a problem, except that a
separate locale needs to be used for currency.
By default the currency locale is the same locale specified
when the class was instantiated. The Façade class has
setCurrencyLocale() methods to allow specifying a separate currency
locale.
The second issue regarding currency is that we need to
carefully consider the numeric type we use to represent it. When
we're talking about most quantities, a little rounding here and there
is not a problem, and float and double are fine. This isn't usually
the case with currency; a penny here and a penny there often matter.
This suggests that we might use an integer type, such as long, with a
decimal offset. In some countries the currency for large transactions
can require support for large denominations that exceed any of the
native Java types. In that case it may be desirable to use a class
like BigDecimal to represent numbers, perhaps wrapping in a Currency
class.
For the purpose of simplifying our example, we assume that
the numbers being formatted are known to represent the correct
currency, that their magnitude will not exceed the capacity of the
native types, and that rounding errors will be insignificant. Table 2
shows some of the currency formats.
(Note: The French number used punctuation, replaced here with
an underscore, that could not be displayed correctly when I ran the
code in a DOS window on Windows 98.)
These results were obtained by using code similar to the
following for English:
I18nFacade intl = new I18nFacade("en","US");
System.out.println(intl.currencyFormat(12345.678));
Date Formatting
Date and time formats vary greatly. For dates, the order and
punctuation between day, month, and year differ. In addition, verbose
formats include the names of the days and months (or abbreviations)
that vary according to the language. Table 3 shows short and medium
formats for dates for a few countries.
(Note: The French name for August includes a character,
displayed here as an underscore, that couldn't be displayed correctly
when I ran the code in a DOS window on Windows 98.)
These results were obtained by using code similar to the
following for English:
I18nFacade intl = new I18nFacade("en","US");
Calendar cal = Calendar.getInstance();
Date date = cal.getTime();
System.out.println(intl.dateFormat(date));
Another issue that the standard Java Internationalization API
doesn't resolve is that different calendars are used in some
countries or for certain purposes. Most of us are familiar with the
Gregorian calendar and this is the only one for which a class,
GregorianCalendar, is provided in the Java API. (This is what the
Calendar.getInstance() factory method in the example above returns.)
You shouldn't use the methods in java.util.Date to construct
date Strings. These have been deprecated as of Java 1.1 because they
only allow you to use a Gregorian calendar.
IBM has open-source Java classes in its ICU4J project that
support other calendars, such as BuddhistCalendar, HebrewCalendar,
and JapaneseCalendar. See
http://oss.software.ibm.com/icu4j/doc/index.html
for more information about these classes
and the calendars they support.
Time Formatting
Table 4 shows the short and long time formats for the same
countries as above:
The code is similar to what we used to obtain dates in the
preceding example, except we use the timeFormat() method:
I18nFacade intl = new I18nFacade("en","US");
Calendar cal = Calendar.getInstance();
Date date = cal.getTime();
System.out.println(intl.timeFormat(date));
Parsing Numbers, Dates, and Time
The same issues we considered for formatting apply to parsing
user input. In addition, processing user input requires that the
format we expect is clear to the user. Finally, we must validate and
properly handle bad input. Specifically, the application must be able
to catch and appropriately handle ParseException.
It's also the responsibility of the application to properly
interpret the returned value. The dateParse() and timeParse() methods
both return a Date, but in the first case the time portion is zero,
and in the second the date is zero. Here's a sample call to parse a
date:
Date userDate;
try
{
userDate = intl.parseDate("12/8/01");
}
catch (ParseException e)
{
// Handle bad input
}
The parseNumber() method returns a Number object and it's up
to the application to convert it to the proper primitive using the
intValue(), longValue(), floatValue(), and doubleValue() methods, for
example:
Number userNumber = null;
// Called parse ...
long userLong = userNumber.longValue();
Sorting, Searching, and String Comparisons
In English, a once common algorithm for comparing strings -
the essential operation in performing a sort - is pretty easy:
convert the strings to uppercase (assuming we want a case-insensitive
sort), then compare them character code by character code. This
doesn't work as a general algorithm, however, because many languages
use characters outside of A-Z and a-z, and some languages, such as
Chinese, don't even have the concept of upper- and lowercase. Other
complications include expansion, where a language sorts certain
letters as though they were two, such as the German eszett (\u00DF),
which is treated as two ss for sorting purposes, and contractions
where two letters such as "ch" in Czech are treated as a single
letter that sorts between h and i.
Besides optionally taking case into account, we also have the
option of taking diacritics into account, depending on the situation.
For sorting, we may want the letter "a" to sort before "á", but for
searching we may wish to consider them equal.
For these reasons (among others too numerous to list here),
we shouldn't use the String.-equals() and String.compareTo()
to compare two strings.
The Java Internationalization API has a class Collator
that we should use instead.
This class, like the format classes, is an abstract class with a
factory method that returns a concrete class, RuleBasedCollator, cast
up to Collator. Collator has a method compare() that will compare two
strings using the default rules for the locale. Collator also has a
property, Strength, that allows us to select between locale-specific
options such as case and accent sensitivity.
Our Façade has a method compare() that will obtain a collator
for us (if it doesn't already have one) and return the results of
comparing two strings:
I18nFacade intl = new I18nFacade("en","US");
if(intl.compare("Apples", "Oranges")
{
// do something based on this result
}
The concrete class is a rule-based collator. Using it is
expensive since it must parse the string according to many rules and
expand or contract characters based on diacritics and other language
rules. If more than a few strings are going to be compared more than
a few times, such as when sorting a large set of data, it may be
desirable to capture the result of this parsing in advance of sorting
by calling the getCollationKey() method, perhaps as data is entered.
The CollationKeys can be saved and later used to perform the
comparisons instead of the strings themselves.
Additional Considerations
The intention of the Façade was to make it easier to use the
Java Internationalization API. Some features are not included because
it wouldn't really make those features easier to use and would only
make the Façade more complicated. Other features are not included
because they require a different solution or approach.
Text Boundaries
The BreakIterator class provides methods for locating
character, word, and sentence boundaries. While it's relatively easy
to detect these boundaries in English, many languages present
difficulties. Chinese and Thai, for example, have no spaces between
words. Spanish can have punctuation at the beginning of a sentence.
Thai has no punctuation whatsoever. It's important to be aware of
these issues, but locating text boundaries isn't something that's
usually useful.
While it's indispensable for doing such things as word
wrapping, dictionary lookup, and indexing, using a Façade isn't going
to make the job significantly easier - it's probably best to use the
BreakIterator class directly in those few specific cases where it's
required.
Display, Input, and Output
I've also omitted support in the Façade for graphical
display, and input and output functionality (including character set
conversions), because these are entirely unlike the other
functionality in the Façade. There are other design patterns better
suited for this type of problem.
Conclusion
Design patterns help software developers by allowing them to
consider what the problems have in common. The insight gained from
solving one can be carried over when solving the next one. The
problem with using the Java Internationalization API - many classes,
loosely related, with many features - describes the type of problem
the Façade pattern is intended to solve. Using this pattern does
indeed make the Java Internationalization API easier to use.
Reference
Grand, M. (1998). Patterns in Java: A Catalog of Reusable Design
Patterns. Wiley. pp. 205-211.
Author Bio
David Gallardo is an independent software consultant specializing in
internationalization, Java and database development. Previously he
led database and internationalization development at a B2B e-commerce company.
dgallardo@mediaone.net
Download Source Files (~ 11.6 KB ~Zip File Format)