Java programmers have been anxiously awaiting the release of the Java Media Framework 2.0 for more than a year. Not only does JMF 2.0 finally let you capture audio and video content, but it claims to solve the most irritating limitations of the JMF 1.x release. Does JMF 2.0 live up to its hype? This article explores the new features and reveals whether this release was worth the wait.
Although the JMF 1.x API was a dramatic improvement over Sun's previous multimedia efforts, it is a work in progress. For instance, you can't record (or capture) multimedia content. Furthermore, it's a closed system: it is impossible to modify or examine multimedia content once a player begins streaming. Finally, the RTP APIs are strangely designed and poorly integrated with JMF.
To circumvent these limitations, JMF 2.0 introduces three types of objects: processors, plug-ins and DataSinks. These objects unleash exciting new features while maintaining backwards-compatibility with your existing JMF 1.x programs.
The improvements in JMF 2.0 are caused by processors, a special type of Player that lets you perform digital signal processing (DSP) operations on multimedia content. Content flows into a processor, which runs an algorithm on the data and streams the result to a destination object. Although writing DSP routines may appear intimidating, you don't need to be an electrical engineer to use them. In fact, any competent Java programmer can create simple DSP routines.
What differentiates a processor from a JMF 1.x player are plug-ins. Earlier JMF players had DSP-like capabilities, but there was no API to access them. By contrast, all DSP operations in processors are performed by well-documented plug-in objects. Sun divides processors' plug-ins into five categories: demultiplexer, effect, CODEC (compression/decompression), multiplexer and renderer (see Figure 1).
Demultiplexer plug-ins receive a single stream of multimedia content and produce one or more tracks (or streams) of data. These types of plug-ins are typically used to separate distinct audiovisual elements in file formats such as QuickTime. For instance, a QuickTime demultiplexer would separate video, audio and text data into three independently modifiable tracks (see Figure 2).
Once a track has been demultiplexed, the processor invokes the preprocessing effect plug-ins (preprocessing effects occur before streams are decompressed). Effect plug-ins tweak content but don't fundamentally change the stream. For instance, common audio effects are gain control and noise removal.
After the preprocessing stage, the processor checks to see if the track contains (or should contain) compressed content. If compression/decompression is required, a CODEC plug-in is executed. Although JMF 2.0 provides CODECs for common formats such as MPEG-audio and Cinepak video, you can insert your own CODEC for proprietary or unsupported formats.
Postprocessing plug-in effects are started when the codec plug-in finishes. Normally, both pre- and postprocessing plug-in effects operate on uncompressed content. Consequently, if you're decompressing (or playing) content, you should implement a postprocessing effect. Similarly, when compressing (or recording), effect processing should be done in the preprocessing phase (see Figure 3).
When the postprocessing plug-in completes, the processor can send the resultant tracks to either a renderer plug-in or a multiplexer plug-in. Renderers are terminating objects that transport a track to its final destination device. For example, an audio renderer would stream uncompressed pulse code modulation (PCM) content to its associated audio hardware. Likewise, a video renderer paints bitmaps onto a video device or window.
Multiplexers are the inverse of demultiplexers: they combine two or more tracks into a single track (see Figure 4). Multiplexers are popular in recording scenarios since they let you combine multiple tracks into a single format (e.g., QuickTime or MPEG).
Once the plug-ins have finished, you can stream the output to a DataSource created by the processor. The DataSource can be used to transfer the output of the processor to another player or for recording purposes.
New States of Mind
Conventional JMF players go through five state changes (unrealized, realizing, realized, prefetching and prefetched) before they can commence playback (see my February JDJ article [Vol. 5, issue 2] for more information on states). Processors undergo two additional state transitions before they enter the realizing state: configuring and configured (see Figure 5).
When a processor enters the configuring state, it attempts to demultiplex its input stream and determine the type of content in each track. A processor becomes configured when the input stream has been demultiplexed and essential track information has been retrieved. It alerts you about the new state by reporting a ConfigureCompleteEvent to your application.
Once the processor is configured, TrackControl object(s) can be obtained via the getTrackControls() method. These objects let you determine which plug-ins are active and the specific phase a plug-in should execute.
Most programmers don't care about DSP processing. Rather, they only need to play or record. Consequently, JMF lets you bypass these details via the processor's start() method. If you start() the processor when it's unrealized, it implicitly goes through the configuring, configured, realizing, realized, prefetching and prefetched states before finally commencing playback or capturing data.
The Kitchen Sink
Once a processor completes its task, you use a DataSink to save content to a specific file format or retransmit the media across a network. Like renderers, DataSinks are terminal objects that stream content to an output device. For example, if you had an MPEG processor that captured and compressed MPEG audio streams, you could connect its DataSource to an MPEG DataSink. The MPEG DataSink would in turn save the content to a properly formatted MPEG Layer 3 (or .mp3) file (see Figure 6).
DataSinks aren't restricted to saving data to files. In fact, they can communicate with a variety of output devices. For instance, you could create a broadcast DataSink whose output device is the Internet. It takes the output of a JMF DataSource and transmits it to a well-known multicast IP address (see Figure 7). All computers that are listening to the multicast IP address can receive and decode the stream produced by the broadcast DataSink.
Since DataSinks must be connected to an input DataSource, it can be tricky to construct one. Fortunately, the Manager simplifies this process with the createDataSink() method. First, obtain a DataSource (typically from a processor) and pass it to createDataSink(). The manager then constructs a DataSink and attaches it to your DataSource.
Capture the Flag
If you want to capture content, you'll need to construct at least one processor and a DataSink. First, create and configure the processor (adding CODECs or effects). Then attach the processor's output DataSource to the DataSink via createDataSink().
Before recording commences, you must start() the DataSink. This ensures that the DataSink is ready to process the content when it arrives from the processor. The capture process commences when you call the processor's start() method.
When you're finished recording, you must flush (or write) all remaining buffers to the file created by the DataSink. To ensure no data is lost, you first close() the processor to cease recording. Then you close() the DataSink to ensure that all content is written to the file.
As we discovered last month, the JMF 1.x RTP architecture is inconsistent and often confusing. Fortunately, Sun has dramatically improved the JMF 2.0 RTP API by leveraging processor plug-ins such as CODECs, demultiplexers and multiplexers.
As we also discovered last month, the RTPSessionManager shields you from the RTP programming complexities. The JMF 2.0 version of RTPSessionManager is equally simple: you create a MediaLocator and construct a player from the MediaLocator.
Although RTP player creation is similar between JMF 1.x and 2.0, the objects composing a player are dramatically different. For instance, a JMF 1.x RTP player consists of an RTPSessionManager (i.e., an RTPSocket and MediaProxy objects) and an RTP MediaHandler. Alas, neither are pure JMF objects.
The RTPSocket is a strange DataSource that not only transmits and receives RTP content but also contains another DataSource used for communicating Real-Time Control Protocol (or RTCP) information. Since JMF 1.x had no API for transmitting outbound streams, Sun created an RTP-specific outbound DataSource RTPIODataSource and a nonstandard interface PushDestStream to push (or stream) content to the RTPIODataSource.
RTPSessionManager also contains a MediaProxy that converts the output of the RTPSocket DataSource into a format that the RTP MediaHandler can understand. It consists of an RTP Protocol Handler, a depacketizer and a depacketized DataHandler (see Figure 8).
Although the MediaProxy supports the most popular RTP formats, Sun realized that developers would need to support additional formats (or RTP payloads). Unfortunately, JMF 1.x doesn't define a mechanism to customize MediaProxys. Thus they created the depacketizer interface, a non-JMF interface that lets you decode custom RTP payloads inside an RTP-centric MediaHandler.
The JMF 2.0 RTPSessionManager uses a dramatically different approach to create an RTP player. Rather than relying on nonstandard interfaces like the depacketizer and PushDestStream, the 2.0 RTPSessionManager replaces them with standardized objects such as RTPStreams, processors, CODECs and demultiplexers (see Figure 9).
An RTPStream represents the flow of RTP data and this interface is used by the RTPSessionManager to communicate with other parties in the RTP session. There are two types of RTPStreams: ReceiveStream and SendStream. ReceiveStream objects are created when content is received from a remote party. SendStream objects are constructed when content must be sent to a remote party.
If you're receiving (or playing) RTP content, then the RTPSessionManager will create a ReceiveStream for each stream in the RTP session. The output of the RTPSessionManager is streamed into a standard JMF DataSource, which then funnels its output into a processor where the content can be decompressed and effect plug-ins executed. The processor's output is then sent to a DataSink, renderer or MediaHandler (see Figure 9).
To broadcast an RTP stream, you create a DataSource to retrieve the content. This information is streamed into a processor for compression and then sent to a DataSource connected to the RTPSessionManager. The RTPSessionManager uses the SendStream interface to transmit this data to other parties in the RTP session (see Figure 10).
Although the RTPSessionManager still uses RTPSocket to send and receive RTP streams, Sun transitioned from the temporary RTPIODataSource and PushDestStream to RTPPushDataSource and PushSourceStream. Since the latter is also used when capturing content, the new RTP architecture is finally consistent with JMF principles.
Alas, RTPSocket retains the concept of an RTCP DataSource contained within the RTP DataSource. From a purely object-oriented standpoint, a cleaner solution would be a single DataSource that could simultaneously handle both RTP and RTCP streams. Since this design has survived the transition from JMF 1.x to 2.0, we must assume that Sun believes this DataSource within a DataSource is the proper design.
Sun has also replaced the RTP-specific depacketizer with processors and CODECs. If you need to implement support for a proprietary RTP payload, you'd insert a plug-in (or CODEC) for that custom RTP payload into the processor. Another benefit to using a processor is the ability to insert a multiplexer or demultiplexer to split or combine RTP streams (see Figure 11).
Although these architectural changes create a more robust environment for RTP development, they aren't backwards-compatible with the JMF 1.x RTP APIs. If you heeded my advice last month and concentrated on creating high-level RTP applications, you should have very few code changes. By contrast, if you're using low-level interfaces such as the depacketizer, you've got a lot of code to rewrite.
RTSP at Last?
Until JMF 2.0, the only way to play RTSP content was to use RealNetwork's RTSP-based player. Although it isn't well publicized, JMF 2.0 contains early alpha-level code for accessing RTSP servers. Sun is emphatic that RTSP support is for evaluation only and shouldn't be used in production code. If you need to deploy an RTSP solution, you should write your own RTSP MediaHandler or use RealNetwork's product.
Although JMF 1.x provides important multimedia features, it can't capture content, is difficult to customize and has a strange RTP architecture. JMF 2.0 solves these problems with processors, plug-ins, DataSinks and a revamped RTP architecture. Processors and plug-ins let you perform digital signal processing operations on multimedia content, while DataSinks can be used to store or display content. By combining processors, plug-ins and DataSinks, you can capture and compress multimedia content.
Sun's new RTP architecture also leverages processors, plug-ins and capture interfaces to create a flexible solution that can handle common RTP payloads. And it can also be easily extended to accommodate custom formats.
Unfortunately, there are some glaring weaknesses in JMF 2.0. The RTP API isn't backwards-compatible and the RTSP support is immature. Despite these flaws, JMF 2.0 is a quantum leap forward for Java multimedia programmers.
Linden deCarmo is a senior software engineer at NetSpeak
Corporation, where he develops advanced telephony software for IP networks. Linden is also the author of Core Java Media Framework, published by Prentice Hall.