HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

Building Telephone/Voice Portal With Java Phonelets, by Kent V. Klinner III & Dale B. Walker

Last month we introduced the phonelet architecture and developer API for creating lightweight telephone/voice services in Java. This article expands the automated answering machine with caller ID, basic voice processing capabilities, and, just for fun, a simple appliance control function using an inexpensive X10 power line module. We'll also introduce voice recognition and speech synthesis and provide recommendations for further exploration.

Caller ID Makes Simple Phonelets More Useful
Caller ID, recently made available by most U.S. telephone companies, identifies the telephone number and sometimes the name associated with a particular telephone line. The information is transmitted between the first and second ring using Frequency Shift Keyed (FSK) modem tones. Most modems can be set to provide this information if it's on the telephone line.

Two formats are used for CID:

  • Single Data Message Format (SDMF), which provides date, time, and calling number
  • Multiple Data Message Format (MDMF), the more common format, which additionally provides the name associated with that number
Both formats, when provided by a modem, take the form of an ASCII hex string, where the first byte of the string indicates the message type.

An MDMF string might look like this:
This would decode to:
20: 32 bytes of data
01 08 3033323430393032: date and time (length 8) 03/24 09:02
07 08 4A4F484E20444F45: name (length 8) JOHN DOE
02 0A 38303035353531323132: number (length 10) 800 555 1212

The CallerId Class takes such a string and decodes it into useful information.

Caller ID can transform a useless phonelet like HelloCaller into a useful application. The following code demonstrates the service method of a phonelet that never answers the phone, but always announces the phone number of the calling party.

public void service (PhoneCall call) throws IOException {
CallerId cid = call.getCallerID();
String callerPhoneNumber = null;
if (cid != null) callerPhoneNumber = cid.getPhoneNumber();
Talker.talk("Incoming call from "+formatForReadback(callerPhoneNumber));
AnnounceCaller simply reads the phone number of the calling party to the PC user. The caller ID packet is available only on telephone lines that subscribe to the caller ID service of their local telephone company. If the packet is unavailable, the phonelet simply announces the caller as "unknown." The details of the announcement are implemented in the method formatForReadback(). The Talker.talk method assumes the presence of a text-to-speech package like IBM's ViaVoice. (See section "How to Handle Speech" for more detail.)

One of the most frustrating aspects of developing audio applications is wrestling with the wide variety of audio file formats, encoding schemes, sample rates, and resolutions. The narrow bandwidth and serial speed limits of voice modems narrow the selections for phonelet programmers, but there is still room for confusion. Table 1 illustrates the range of choices for some Lucent and Rockwell/Conexant chip sets.

table 1
Table 1: The range of choices for certain Lucent and Rockwell/Conexant chipsets.

The phonelets' framework uses javax.sound.sampled, and therefore supports wav file format and the encoding schemes defined for AudioFormat in Java v1.3 (PCM, u-law and a-law).

Three methods defined in the VoiceModem interface will help developers choose an appropriate audio format for playing or recording sound.

VoiceModem.getAudioFormats() returns an enumeration of audio formats supported by the modem. Developers who extend GenericVoiceModem to develop a device handler for a specific modem should specify at least one common audio format. Modem documentation can be difficult to obtain, but one of the most accessible sources of modem information is available in the INF files distributed with Microsoft Windows. Each modem manufacturer provides an INF file as part of the Windows installation package to specify the modem's operating capabilities and command sets. A description of modem INF files is beyond the scope of this article, but if you're familiar with AT command sets you won't have much trouble culling useful information for your model.

  • VoiceModem.getAudioFormat() returns the currently active audio format.
  • VoiceModem.setAudioFormat() sets the current audio format for the modem.
  • VoiceModem.getPreferredAudioFormat() returns the preferred audio format for the modem.
The definition of preferred is up to the device handler developer. For example, some modems based on Rockwell/Conexant chips support a PCM 8-bit encoding at 11,025 Hz, but won't report DTMF events or silence intervals at that rate. At a lower sample rate of 8kHz the same modem will detect and report DTMF events as well as intervals of silence. Clearly, the higher rate would not be preferred for any application that needed to detect touchtones while playing a message for the user.

How to Play Audio
In the HelloCaller example we demonstrated how to play a wav file to a caller. We'll now discuss the Phone.play() method in more detail.

GenericVoiceModem doesn't provide on-the-fly audio format conversion. Developers must make sure that the audio format of the file, stream, or data buffer they intend to transmit with the Phone.play() method matches the currently active format of the phone. If you try to play an 8-bit/8kHz/u-law wav file through a phone set to 16-bit/7.2kHz/PCM, the result will be unintelligible. The following code illustrates the correct way to play a wav file. If the audio format of the wav file isn't supported by the phone's underlying modem, a ModemException is thrown.

try {
AudioFormat audioFormat =
(); phone.setAudioFormat(audioFormat);
catch (PhoneException e)
// thrown if audioFormat is not supported
The Phone.play() method allows callers to interrupt a message with a simple touchtone. This interrupt feature is especially useful when your phonelet is prompting the user for touchtone input and you want an instructional message to stop playing immediately upon detection of a specific touchtone key. In the following example we play a sequence of wav files, aborting the play if a certain key is pressed.
public void service (PhoneCall call) throws
IOException { Phone phone = call.answer(this);
if (phone == null) return;


How to Record Audio
The service routine in Listing 1 demonstrates how to record a message from an incoming caller. In this example all instructional messages and prompts are played from wave files. (All listings can be found below)

A reasonable message recorder application should be tolerant of slow callers and resilient to premature hang-ups. Before a message is recorded, instructions should be repeated a reasonable number of times. If the caller doesn't respond with an appropriate touchtone or verbal response within a reasonable time frame, the phonelet should hang up. When recording commences, the prudent developer will make allowances for callers who hang up or are disconnected. Reasonable applications may also allow the caller to terminate the recording phase with a touchtone.

Phone.record() allows developers to limit recordings to a maximum time and terminate them when the device detects a defined period of silence or with a touchtone. The API documents the specific methods.

Not all modems will support silence detection. As mentioned previously, some Rockwell/Conexant chips won't report silence intervals or detect touchtones when the sampling rate is 11,025 Hz.

One of the design goals of the phonelets framework was to insulate the developer from the complexities of audio stream handling. The wide variety of hardware has made it almost impossible to hide all the details. Developers need to understand the basic capabilities of their hardware and, above all, be very paranoid.

The Phone.play() method accepts the touchtone string "one" as an interrupt argument. If the caller presses the "1" key while the instructions are playing, the phonelet interrupts the message and starts recording it.

The Phone.record() method accepts a maxTime argument and a maxSilence argument that stops the recording after a specified number of seconds or after a specified interval of silence has been detected (the caller may have hung up).

Finally, the phonelet plays a goodbye message to indicate that it is about to terminate the call. Of course, a really useful phonelet might do something with the message file before exiting, like e-mail it to the owner or copy it to a specific folder.

How to Place an Outgoing Call
Placing an outgoing voice call can be quite complicated for a modem. Data modems and fax machines answer with specific tones to signal their functions, but humans and answering machines aren't so predictable. Determining when a call has been answered has been a challenge for modem and telephony board manufacturers. The most common scheme among modem manufacturers is to set a timer that expires when the ring signal goes away. This method is crude and isn't always reliable.

Manufacturers have also had some difficulty determining precisely when a call has been terminated by the distant party. Some modems and telephony boards use silence detectors but, again, these methods may not work reliably, especially if the line is noisy. Other schemes have been developed but require a much more detailed discussion of telephony systems and audio signal processing.

The phonelet framework allows developers to place outgoing calls with the Phone.call() method. It also allows developers to redirect audio to a specific OutputStream. We challenge readers to develop an audio stream processor that can accurately and reliably detect a spoken (or recorded) greeting. We also challenge you to develop a reliable adaptive silence detector.

Control Your World
A personal phone/voice portal can provide access to home and office appliances. Remote control can be easy with an inexpensive X10 serial line power control module. If you have a DSL modem in your home or office, you've probably experienced the kind of failure that requires you to cycle the power on the modem to reestablish your Internet connection. This kind of failure can be very inconvenient when you're away from your office and your Web server is stranded behind a stalled DSL connection.

With an inexpensive X10 serial port transmitter and a wireless power line control module, you can command a restart remotely with a telephone touchtone. You can purchase X10 appliance control modules at most electronics stores, but make sure you don't connect a sensitive electronic device like a DSL modem, a VCR, or a PC to a lamp dimmer module. Use the appropriate appliance module.

The Phonelet framework includes the X10Transmitter class to simplify the transmission of the control commands through an available serial port. A detailed explanation of X10 is beyond the scope of this article, but Listing 2 shows how simple it is to turn appliances on and off with touchtones.

How to Handle Speech
Voice recognition technologies have advanced rapidly in the last decade, although speaker-independent voice recognition with large vocabularies remains the industry's holy grail. All of the commercial voice portals employ speech recognition. The limited vocabulary of the navigation commands simplifies the speech recognition problem and results in a much more accurate, although not yet perfect, dialogue between caller and service. At least one service employs new hardware that accelerates the voice recognition algorithms with chips that cost as little as $2. Nevertheless, speech-driven navigation can be cumbersome and vulnerable to background and line noise.

Voice recognition is beyond the scope of this article, but the phonelets framework can be extended to support a third-party voice engine.

Text-to-speech systems have also improved dramatically in recent years. The robotic monotone of earlier engines is giving way to a much more natural elocution. Lernout and Hauspie, in particular, have produced a text-to-speech system that closely approximates the natural rhythms of the human voice. Nuance also provides a sophisticated and Java-compatible framework for text to speech. Speech synthesis is likely to be a much more useful addition to a lightweight voice portal than speech recognition as it'll allow callers to access information from home, office, and Web databases that would otherwise be available only via PC, wireless PDA, or Web-enabled cell phone.

The Java Speech API attempts to standardize the interface between applications and third-party speech products, but severe limitations in the current API will force service developers to use vendor-specific interfaces. Specifically, JSAPI provides no mechanism for the developer to redirect the output of the synthesizer from the speakers to a file or to an output stream. JSAPI is a prime example of emerging standards that are still incomplete and often poorly implemented by third-party vendors.

The Talker class included in the "Resources" section demonstrates a quick-and-dirty workaround for JSAPI limitations. Specifically, it demonstrates how to wrap the speech synthesis function of a JSAPI-compliant engine for easier access. The talker was developed and tested for IBM ViaVoice Millennium Edition and will require IBM's ViaVoice SDK Java Technology Edition, available to developers on the Web at www-4.ibm.com/software/speech/dev/sdk_java.html.

Warning: The Talker workaround is provided for informational purposes only. Consider it experimental and only a temporary patch for the current limitations of the JSAPI.

Talker attempts to bypass the limitations of the JSAPI by implementing an audio loopback function that will capture synthetic speech through the audio input port as the synthesizer is playing the audio (we warned you that this was for experimenters only). You''ll need a full duplex audio card capable of playing and recording audio simultaneously. You'll also need an audio patch cable appropriate for your sound card. Some sound cards will require an attenuating cable between the audio output port and the microphone or audio input port. The safest way to loop the audio output back into the audio input may be to route the audio through an amplifier or mixer with an appropriate audio line out or microphone out port. You'll also have to adjust volume levels manually.

This workaround is viable only because of the ViaVoice speech synthesizer's ability to queue text-to-speech requests, operate asynchronously, and accurately report the start and conclusion of audio output. Talker is able to wrap the synthesizer and treat it as a shared resource.

VXML is emerging as the standard for scripting phone/voice service applications. Based on XML and supported by a consortium of telephony vendors including IBM, Nuance, and Lucent Technologies, VXML looks promising as a cross-platform scripting system.

If your voice service can be implemented as a relatively straightforward dialogue between caller and server, and if it requires relatively simple error handling and little or no access to native platform capabilities or local resources, then VXML may be the most cost-effective development approach. If your service requires sophisticated or special speech processing capabilities, you might consider a hybrid approach using VXML and a third-party voice component framework like Nuance's SpeechObjects and Voice Channel technologies.

VXML is a young standard, and portable implementations may be expensive and complicated. You can get started with VXML by visiting TellMe Networks. Nuance also provides a lot of helpful information for developers.

Relevant Packages and Standards for Java Developers
The Java Telephony API and the Java Speech API promise to simplify the development of telephone/voice systems and applicationsŠeventually.

JTAPI must be implemented by telephony device vendors and will offer systems developers a comprehensive software platform for telephone devices, call centers, and PBX/CBX systems. Many of its functions go beyond what is necessary for voice/phone application developers.

Application developers can look forward to a unified interface to commercial speech synthesis and recognition engines, but significant oversights in the API render it practically useless for service developers. Notably, JSAPI assumes that all speech synthesis will be delivered through the default audio output device of the user's PC and that all audio input will be delivered through the user's microphone. Audio systems controls for redirecting output to a stream or file are lacking as are controls for specifying an audio input stream. Service developers must look to third-party Java interfaces for comprehensive packages.

Developers will find a lot of useful VXML information on the Web from IBM, Nuance, and TellMe Networks. Visit the www.vxml.org Web site for a complete specification.

Building Automation Java API (JSR-000060)
Environmental control of office buildings is emerging as a ripe opportunity for cross-platform software. Commercial real estate management companies will automate their control and management functions with standards that work across a broad field of properties.

Open Services Gateway Specification
The home of the future has been a favorite subject of World's Fair prognosticators for decades. The OSG consortium (www.osgi.org) may actually get it right. When broadband opens the big double-wide doorway into homes and offices, the possibility for services will emerge and the demand for applications will follow. Java developers will be influential in specifying how these services are developed and delivered. If you want to turn on the lights and set the thermostat from your cell phone, watch this specification for the nuts and bolts.


  • AnnouncePhonelet.java: Demonstrates how to process caller ID packets. Announces an incoming call and the identity of the caller if the phone number (as delivered by caller ID) is contained in a contacts database.
  • MessagePhonelet.java: Source code demonstrating how to record and play voice messages, how to generate synthetic speech from text, and how to use touchtones to control program flow.
  • Talker.java: Source code for a wrapper to simplify use of a JSAPI speech synthesis engine. Has been tested with IBM's Via Voice and Speech for Java SDK.
  • CallerID.java: Source code for a caller ID object that parses unformatted caller ID packets.
  • PhoneServerLite: A multithreaded phonelet host with loadable device handlers and phonelets. Includes javadoc documentation for phonelet framework.
  • SpeedSerialWin32.dll: Win 95/98/NT native library for high-speed serial access. JavaSoft's javax.comm package provides adequate support for serial data transfers at low speeds, but fails at the relatively high speeds necessary to record and play digitized voice with an external voice-enabled modem. The javax.comm package is unnecessarily complicated by an attempt to wrap the parallel and serial ports into a single package. SpeedSerialWin32 simplifies the programmer's view of the serial port and provides reliable high-speed serial transfers.
  • CommonVoiceModemCommands.txt
Recommended Reading
  • Lindley, C.A. Digital Audio with Java. Prentice Hall.
  • McClellan, J.H., et al. DSP First: A Multimedia Approach. Prentice Hall.
  • Pierce, J.R., and Noll, A.M. Signals. Scientific American Library.
  • Rorabaugh, C.B. DSP Primer. McGraw-Hill.

Web Links

Author Bios
Kent V. Klinner III, chief technology officer at TransPhonic, Inc., develops platforms and components for portable and wireless devices. An electrical engineer who still likes to get close to the hardware, he's been developing Java since 1995. He can be contacted at: [email protected]

Dale B. Walker is principal engineer at TransPhonic, where she develops applications for mobile and wireless devices. Dale is an electrical engineer with 20 years of broad experience. He can be contacted at: [email protected]

Download Assoicated Source Files (Zip format ~ 3.61 KB)

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.