HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

Building A Telephone/Voice Portal With Java, by Kent V. Klinner III and Dale B. Walker

Telephone access to the Web is the latest craze sweeping the dot-com landscape. Voice portals with names like BeVocal, Quack.com, Tellme, and AudioPoint are promising all callers easy access to news, traffic reports, stock quotes, and driving directions. Some of these services may flash and burn as quickly as a California brushfire, but they represent the leading edge of a much larger trend that began several decades ago and has accelerated with advances in audio- and speech-processing technologies: the convergence of voice and data networks.

With 2 billion telephones and over 400 million cell phones in use today, the sheer scale of this convergence may lead to the most profound changes in our communication networks since the advent of the Web browser.

Emerging standards for voice-over IP and wireless access protocols present systems designers with many choices for remote access technologies, but the predominant communication device, the plain old telephone, means that voice gateways can open data services to a larger population of users than any other available technology.

Java developers are in a strong position to drive voice and data network convergence with cross-platform software for servers, consumer devices, Internet appliances, and mobile handsets. With server-side components for telephony and voice processing, Web content, online databases, and e-mail will be just a phone call away. Client-side applets for mobile devices, handsets, and Internet appliances will deliver applications directly to users at the mobile edges of the Net.

Bridging the circuit-switched world of the telephone and the packet-switched world of the Internet can be a daunting task. Controlling telephone devices, detecting touch tones, and processing caller ID packets and digitized voice often require programming in C or an assembly language for arcane platform-specific hardware. With the information and resources in this article, Java developers can add basic telephone and voice services to their applications quickly and easily.

The market is still testing the value of voice-recognition access to driving directions and cinema schedules. The real value in voice/data convergence technologies may emerge with the development of more personal services, like remote control of your office network, telephone access to your home security system, or mobile access to e-mail, faxes, and voice messages through your existing office or home telephone number.

This article describes a fast and easy way to develop lightweight telephony applications with Java. In the process we hope to demonstrate that Java is the most powerful and robust language for developing and deploying sophisticated services for voice and data networks. With the phonelet framework for telephony services you'll be able to get started immediately.

We've defined a phonelet as the most basic element of a voice portal service application. Servlet developers will immediately recognize the similarity between phonelets and servlets. The phonelets described here shouldn't be confused with Java applets that run within cell phones and other portable devices running a Java VM.

Nuts and Volts
What Is a Voice Portal?

A voice portal is really nothing more than a call center with connections to Web services, some voice recognition services for navigation, a speech synthesis engine for converting text output to the caller, and, usually, some kind of programming or scripting capability. A voice portal application provides a service that can be accessed from the convenience of a telephone or a cell phone. VoiceXML (VXML) is emerging as a popular standard for scripting dialogs. This article focuses primarily on the use of Java to develop voice applications, but we'll discuss the advantages of VXML in Part 2 of this series.

A voice portal application must talk to its user. All user input and output is through narrow-bandwidth audio channels. A voice application must operate efficiently and reliably with nothing more than audio input and output in a narrow range between 30 and 4,000Hz.

A voice portal application developer is essentially a signal processing engineer. Developers must write applications that can filter audio input streams for user commands and convert all output to an audio stream that can be understood by the caller. Fortunately, the developer's problem is made a bit simpler by telephone touch-tone signals, speech-processing components, and the digital signal-processing functions of telephony boards and voice modems. An understanding of audio signal processing will help, but isn't necessary.

How to Program Voice Portal Applications
There are two ways to program voice portal applications: (1) use a standard programming language like C or Java, or (2) use a standard scripting language like VXML or one supplied by the call center manufacturer or voice portal service. The usual trade-offs apply. Using the former offers the developer the most power, but may impose the burden of complexity or machine dependence. The latter approach is often easier, but may not allow the flexibility and sophistication available to C and Java programmers.

VXML is emerging as an excellent choice for scripting cross-platform phone/voice services. IBM has announced that their forthcoming VoiceServer will support VXML, and Tellme networks, one of the high-profile voice portals, offers developers access to their VXML programming toolkit. Installing your own VXML server can be complicated and expensive. This article describes a lightweight phone/voice service framework that you can use to develop personal voice services on your own home or office network. We'll demonstrate how to add telephone access to servlets and applications with an inexpensive voice modem and a simple Java framework for processing caller ID, touch tones, and digitized voice. The example code, found below, will demonstrate how to process caller ID packets to announce incoming calls and how to detect touch tones, play and record audio, and generate speech using a third-party speech synthesis application like IBM's ViaVoice. With the phonelets framework included in the Resources section at the end of this article, Java developers can create cross-platform applications combining phone, voice, and Web services.

Phones Are from Venus, PCs Are from Mars
To grasp the challenges and limitations of a voice portal service, you should understand something about telephone technology. Telephone line signals span a broad range of voltages from 12 to 110v. Computer interfaces operate in a relatively narrow range of voltages under 5v.

The connection between your telephone and the local branch exchange is called the local loop and is simply two twisted copper wires. The audio from both parties and the signals to dial another party, signal a busy connection, or initiate ringing must be conveyed on those two wires in a narrow audio bandwidth channel between 30 and 4,000Hz.

Telephone company switching systems have advanced as rapidly and as dramatically as computers. Actually, telephone switching requirements drove much of the computer revolution. By contrast, our telephone handsets and the local loop have changed little. The challenge for telephony engineers has been to extend new services to local loop customers while maintaining compatibility with a design specified almost a century ago. The simplest solution has been to overload the local loop connection with audio control tones. Touch tones and caller identification packets are familiar examples of audio control signals conveyed through the local loop.

The telephone interface circuits of modems and telephony cards have been bridging these two worlds for decades. Even in an age of digital PBXs and broadband services, a good analog modem with voice and fax capabilities can be a powerful tool for application developers.

Modems Are Signal Processors
While cell phones, PDAs, and Web browsers have captured our hearts and our wallets, the common modem has continued its slow march of progress in relative obscurity. We take it for granted. It's an essential part of every PC and laptop because any PC or laptop without dial-up access to the Internet would be seriously limited. Today's modems bear little resemblance to their raucous, slow, and limited forebears of the 1970s and '80s. Today's modems have almost as much processing power as a PC of 15 years ago. The best modems are marvelous little packages of signal processing and telephone line control. Understanding them is essential to understanding the capabilities and limitations of your system.

In the early 1990s modem manufacturers began adding voice processing and tone-detection capabilities to the basic tone-generation function necessary for touch-tone dialing. Today inexpensive voice modems offer audio encoding and decoding at a variety of rates and resolutions and in a variety of formats. Many modems will deliver out-of-band events like keypad tone detection, silence detection, busy-signal detection, and fax synchronization tone. Some modems offer caller ID reporting, automatic volume control, and full duplex operation.

Unfortunately, modems are often poorly documented and their behaviors don't adhere to anything more stringent than de facto standards. The phonelet framework simplifies modem programming with a GenericVoiceModem class that you can extend to adjust to the specific characteristics of your modem.

While modems simplify many of the routine tasks of call initiation and termination, they don't insulate the software developer from all the vagaries of an analog communication channel. Java developers who have never written telephony software or programmed a modem may be surprised at how difficult it is to write an application that can handle the demands of streaming audio as well as the complexities of telephone and modem control and still remain responsive and robust to unpredictable events like DTMF (Dual Tone Multiple Frequency) tones, caller hang-up, or device failure. The phonelet development framework hides much of the complexity of modem control and signaling behind a simplified interface so that Java developers can concentrate on the application logic.

Voice Portal Building Blocks
A basic understanding of the components of a voice portal system will help you understand the capabilities and limitations of voice and telephone access to digital networks. Figure 1 illustrates the basic architecture of a voice portal system. The major components are (1) telephone device handlers, (2) a phone call dispatcher, (3) a set of voice portal applications, and (4) a set of application services.

figure 1
Figure 1:  Voice portal building blocks

A virtual phone abstraction can simplify the application's interaction with the caller and hide some of the details of telephone line control and signaling. Application services might include libraries for audio signal processing, speech processing, and application scripting support for VoiceXML. VoiceXML is beyond the scope of this article, but you can find lots of information and support on the Web sites of TellMe Networks, Nuance, and IBM.

Emerging standards in telephony and speech processing promise to simplify the development of robust scalable applications, but many of these standards are in flux and not fully implemented. You may need to work around limitations in components and APIs.

Call Dispatcher - PhoneServerLite
PhoneServerLite (see Resources section) is a simple phonelet host that implements the most important features of a voice portal server.

  • It acts as the call dispatcher, passing incoming phone calls to the phonelet applications for service (the first phonelet to answer the call gets the service).
  • It supports multiple telephone lines and multiple phonelet applications.
  • It's threaded for efficient multiuser applications.
  • It's resilient to phonelet service failures (bugs in developer code).
  • It's resilient to device failure and disconnection.
  • It's scalable from laptops to multiprocessor servers.
  • It's easy to configure.
  • It's easy to extend with custom device handlers and phonelets.

PhoneServerLite configures itself at start-up, and expects the following file folders to contain configuration information, phonelets, and device handlers:

  • Phonelets: Contains the class files for each phonelet application. The file phonelet.txt identifies the phonelets to be loaded at start-up of PhoneServerLite. The sample file documents the format.
  • Devices: Contains the class files for your custom voice modem device handlers. The file devices.txt specifies the devices (i.e., modems) and the parameters. The sample file documents the format.
  • Files: PhoneServerLite creates a folder for each phonelet specified in the phonelet.txt configuration. A phonelet may store permanent files in its file folder. A phonelet can get a file reference to its permanent file folder with the PhoneletConfig.getFileFolder() method.
  • Temp: Temporary files go here. Each instantiation of a phonelet will have its own folder within the "temp" folder. A phonelet may store temporary files in its temp file folder. A phonelet can get a file reference to its temporary file folder with the PhoneletConfig.getTempFolder() method.
  • Resources: Contains resources available to all phonelets, including prerecorded sounds.

The PhoneServerLite start-up procedure is simple.

  1. It loads the voice modem device handlers specified in the devices/devices.txt configuration file.
  2. It loads and initializes each phonelet specified in the phonelets/phonelets.txt file.
  3. It listens for incoming phone calls on each line owned by a device handler and dispatches a service notification to each phonelet when a line rings.
PhoneServerLite dispatches phone calls to phonelets in the order in which the phonelets are loaded (the order in which they appear in the phonelets.txt file). The first phonelet to answer a call handles it. When the phonelet has finished, PhoneServerLite resets the device (hanging up the line in the event that the phonelet fails to terminate the call) and returns it to the waiting condition. Phone calls are dispatched after the second ring because PhoneServerLite gives the device handler an opportunity to detect the caller ID packet, which arrives as a sequence of signals between the first and second ring.

Telephone Device Handlers: GenericVoiceModem
We've chosen to present the phonelets framework with device handlers for voice modems instead of more sophisticated telephony boards because: (1) modems are inexpensive and readily available; (2) they meet the minimum hardware requirements for telephone line control, audio capture, audio play, and tone detection; and (3) they're relatively easy to program through a serial port interface. Telephony cards from manufacturers like Dialogic, Periphonics, and Lucent are far more sophisticated, often including onboard signal processors, but they're more expensive, not as readily available, and, most important of all, not always programmable in Java. Voice modems will keep our discussion at a stick-and-rudder level. Everything you learn from this article applies to more sophisticated hardware, but along the way you'll experience the thrill of flying low with minimal equipment.

Applications communicate with modems through serial interfaces, either physical or virtual. PhoneServerLite includes a SerialPort class and a native driver for Windows 95/98/NT users who can handle the relatively high serial data transfer rates required for digitized audio. JavaSoft's javax.comm serial package for Java has failed with buffer overrun and underrun errors at speeds far below those necessary for audio applications. SerialPort and SpeedSerialWin32.dll simplify the developer's interface to serial ports and provide a stable driver for sustained high-speed serial transfers.

Modems are notoriously difficult to program, and their command sets are often poorly documented and inconsistent. PhoneServerLite includes a GenericVoiceModem class that developers can extend to support the voice modem of their choice. GenericVoiceModem implements the VoiceModem interface and makes no attempt to support data and fax command sets. Its sole function is to provide a simple and reliable interface for common voice modems. Check our Web site for future updates that will add support for fax and data capabilities.

PhoneServerLite loads all VoiceModem device handlers specified in the devices.txt configuration file. GenericVoiceModem implements support for voice modems based on the popular Rockwell/Conexant chip set (e.g., Best Data Smart One, Comtrol RocketModem, and some 3COMHz modems).

Voice Portal Servlets: Phonelets
A phonelet is a lot like a servlet: it's the bare essence of an application. Phonelets embody only the essential logic and data of an application and they can live only in the nurturing environment of a host. The phonelet's simplicity frees the developer from the complexities of telephone device management, call dispatching, scheduling, and error handling. Phonelet developers can focus on the essential components of an application and forget about the details.

A phonelet, like a servlet, has a simple life cycle: (1) init, (2) service, and (3) destroy. It depends on its host to feed it with service requests and provide support functions. Phonelets can provide textual descriptions of themselves with the getPhoneletInfo() method.

  • Init (PhoneletConfig config): The phonelet host calls the init() method only once before invoking the service() method.
  • Service (PhoneCall call): The phonelet host (e.g., PhoneServerLite) delivers incoming phone calls to the phonelet's service() method. When the phone rings, the phonelet can check the ring count, the caller ID, and the incoming line to determine whether it will answer the call.
  • Destroy (): The phonelet host calls the destroy() method only once, and only after it has called the phonelet's init() method. The phonelet host won't invoke the phonelet's service() method after the destroy() method is invoked.
The javadoc files for the phonelet API are included in the Resources section. Developers should be familiar with just six principal components:
  1. PhoneletConfig: Passed as the argument to the init() method. Contains accessors for initialization parameters, the file base, the temp file base, and the PhoneletContext object.
  2. PhoneletContext: Contains accessors for system properties and the call dispatcher. All PhoneServerLite phonelets share the same context.
  3. CallDispatcher: Manages phonelets, devices, and phone calls.
  4. Phone: The virtual phone.
  5. PhoneCall: Encapsulates the virtual phone and the caller ID information.
  6. CallerID: Encapsulates the caller ID packet info.

Getting Started Quickly with HelloCaller Phonelet
HelloCaller is a very simple example of a phonelet. Think of it as the audio equivalent of "hello world."

public class HelloCaller extends GenericPhonelet{
public void service (PhoneCall call) throws IOException {
Phone phone = call.answer(this);
The phone.play() plays the WAV sound file to the caller through the voice modem's audio transmit capabilities. The sample rate and resolution of the audio WAV file must match the voice capabilities of your modem. If you try to play a 22kHz 16-bit WAV file through a modem that only supports 8kHz 8-bit audio, your caller will be very displeased with the result. We'll discuss audio streaming in more detail in the next part of this series, but you can check the Java documentation for API details. Writing a simple phonelet that speaks to a caller is very easy with the phonelet framework.

How to Detect Touch Tones
When you dial a telephone number or punch keys to navigate a voice mail system, your telephone generates a tone for each key that comprises two distinct audio frequencies. These DTMF frequencies, carefully chosen as unlikely components of human vocalizations, are listed in Table 1. Note that there are actually 16 combinations of frequencies, four of which aren't commonly used because they're not available from a telephone touchpad.

table 1
Table 1:  DTMF frequency table

These signals propagate unhindered through the telephone network and are decoded on the receiving end by tone detectors. The DTMF system is a relatively new development in a telephone system that has undergone only about three user interface upgrades in a century.

In 1941 AT&T introduced touch-tone dialing for central office operators in Baltimore, Maryland. The speed advantage of touch-tone over rotary dialing offset the significant cost of the electronics. The first affordable touch-tone telephones were introduced in 1962. Touch-tone service would not be widely available in the U.S. until the 1970s.

DTMF tone generation and detection fostered the development of a wide array of touch-tone services and equipment, including voice mail, automated attendants, and telephone banking. It's now a part of our lives and a foundation component of any voice portal system. Touch tones may be the single most useful signaling component of a voice portal system. When was the last time you called a business and a human answered immediately?

Modems listen for caller DTMF. When a modem detects a DTMF tone, it sends a 2-byte data packet to the computer. The first byte of the packet is the shield code - 0x10 for most modems - which lets the computer know that the next byte contains a code that specifies a signal event on the line. For touch-tone events the second byte is the ASCII character of the key pressed. Other event codes are also transmitted in this matter. The exact set of codes supported varies by modem chip set. A sample list for Lucent modem chips is given in Table 2.

table 2
Table 2:  Sample list for Lucent modem chips

Shield codes are passed to the computer in the incoming data stream. If the modem is also in voice-receive mode, the shielded data packet is inserted into the audio stream and must be detected and separated. Otherwise, not only will the information be lost, but the audio will sound really weird. The GenericModem class provides this filtering.

The difficulty in handling touch tones in a voice portal system is that they may come at any time. A caller isn't a computer that can be directed to deliver a touch tone within a specific time window. It's a human being whose clumsy fingers might not press the telephone keypad with the frequency and precision that we demand from machines. The phonelet framework gathers touch tones into a buffer and dispatches an event to the phonelet, which is responsible for monitoring and clearing this buffer.

Every phonelet implements the PhoneUser interface. Touch tones and other shielded codes are delivered as asynchronous events through the phoneEvent() method. Your phonelet is responsible for processing touch-tone event characters, delivered one character at a time. The default PhoneHandler buffers touch tones and provides an accessor method, getTouchtones(), that allows you to retrieve the contents of the touch-tone buffer as a string. There is also a method for clearing the touch-tone buffer, clearTouchtones(). You may, however, manage your own touch-tone buffer, as illustrated in the code snippet below.

StringBuffer touchtoneBuffer = new StringBuffer();
public void phoneEvent (PhoneEvent event) {
if (event.getType() == PhoneEvent.TOUCHTONE)
As a convenience to developers, the play() and record() methods can always be interrupted by a single touch-tone character without additional processing by the phonelet. Interrupting an outgoing message or an audio recording with a specific sequence of touch tones, however, is a more complicated function and must be implemented by the phonelet developer.

Putting It Together
Listing 1 demonstrates a simple answering service that plays a greeting, detects touch tones, and records a message after a beep.

In the Next Part...
The next article in this series will describe audio streaming in more detail, demonstrate how to incorporate caller identification services, demonstrate remote control of home appliances with touch tones, and discuss some of the challenges of speech recognition and synthesis.


  • MessagePhonelet.java: Source code demonstrating how to record and play voice messages, how to generate synthetic speech from text, and how to use touch tones to control program flow.
  • AnnouncePhonelet.java: Demonstrates how to process caller ID packets. Announces an incoming call and the identity of the caller if the phone number (as delivered by caller ID) is contained in a contacts database.
  • Talker.java: Source code for a wrapper to simplify use of a JSAPI speech synthesis engine. Has been tested with IBM's Via Voice and Speech for Java SDK.
  • CallerID.java: Source code for a caller ID object that parses unformatted caller ID packets.
  • PhoneServerLite: A multithreaded phonelet host with loadable device handlers and phonelets. Includes javadoc documentation for phonelet framework.
  • SpeedSerialWin32.dll: Win 95/98/NT native library for high-speed serial access. JavaSoft's javax.comm package provides adequate support for serial data transfers at low speeds, but fails at the relatively high speeds necessary to record and play digitized voice with an external voice-enabled modem. The javax.comm package is unnecessarily complicated by an attempt to wrap the parallel and serial ports into a single package. SpeedSerialWin32 simplifies the programmer's view of the serial port and provides reliable high-speed serial transfers.
  • CommonVoiceModemCommands.txt

Recommended Reading

  1. Lindley, C. Digital Audio with Java. Prentice Hall.
  2. McClellan, J.H., et al. DSP First: A Multimedia Approach. Prentice Hall.
  3. Pierce, J.R., and Noll, A.M. Signals: The Science of Telecommunications. Scientific American Library Series.
  4. Rorabaugh, C.B. DSP Primer. McGraw-Hill.
Web Links
  1. A brief history of TouchTone: www.research.att.com/history/64touch.html
  2. VoiceXML forum: www.vxml.org/
  3. Nuance is rich with downloads and information for developers: www.nuance.com/
  4. IBM speech technologies: www-4.ibm.com/software/speech/
  5. Caller ID FAQ: www.ainslie.org.uk/callerid.ht
  6. JTAPI: www.javasoft.com/products/jtapi/
  7. JSAPI: www.javasoft.com/products/java-media/speech/
Author Bios
Kent V. Klinner III, chief technical officer at TransPhonic, Inc., develops platforms and components for portable and wireless devices. An electrical engineer who still likes to get close to the hardware, he's been developing Java since 1995. He can be contacted at [email protected]

Dale B. Walker is principal engineer at TransPhonic, where she develops applications for mobile and wireless devices. Dale is an electrical engineer with 20 years of broad experience. He can be contacted at [email protected]


Listing 1

public void service (PhoneCall call) throws IOException { 
        Phone phone = call.answer(this); 
        if (phone == null) return; 
String one = "1"; 
        int loopCounter = 0; 
        double beepLength = 1.2; // seconds 
        int beepFreq = 1000; // cycles/second 
        while(loopCounter < 3){ 
                phone.play(new File("pressOneToLeaveAMessage.wav"),one); 
        if      (getTouchtone().equals(one)){ 
                        phone.play(new File("leaveYourMessageAfterTheBeep.wav")); 
                        File messageFile = record(maxTime,maxSilence,beepFreq,beepLength,null); 
                pause(5); // wait 5 seconds 
        phone.play(new File("goodbye.wav")); 


All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.