RTP and RTSP: Protocols that address the transportation of multimedia content
The Internet is strewn with multimedia minefields. Lost or out-of-sequence packets and transmission delays can create havoc in your applications. Fortunately, you can overcome these problems by using protocols optimized for multimedia transportation. This article explains why these protocols are necessary, and examines how the JMF implements them and how you can use them to spice up your programs.
The TCP/IP (Transmission Control Protocol/Internet Protocol) dominates IP (Internet Protocol) traffic because of its reliability. This reliability is possible because TCP/IP contains numerous software layers that prevent packet loss and ensure that packets arrive in the order they were sent.
Unfortunately, while TCP/IP's reliability is beneficial for textual data, it can have a devastating impact on multimedia streaming. This occurs because its numerous software layers steal critical processing power away from multimedia devices. Furthermore, reliability is unnecessary for most multimedia applications and may force disturbing pauses in the playback.
By contrast, the UDP (User Datagram Protocol) is a lightweight but unreliable protocol that's ideal for multimedia transport. Because it doesn't guarantee that packets will arrive at their destination or that they will arrive in sequence, UDP consumes less processing power than TCP/IP. Although unreliability sounds scary, few UDP packets are ever lost. Should a packet be lost, multimedia CODECs (Compressors/Decompressors), the software algorithms that process multimedia
data, gracefully compensate to ensure smooth playback.
Although you can use UDP to transmit audiovisual content, it isn't optimized for multimedia use. Consequently, the RTP (Real-Time Protocol) is built on top of UDP and it provides time stamps, synchronization, packet-loss detection and other multimedia features (see Figure 1). Since RTP is a binary protocol, you must examine packet headers to determine multimedia attributes such as audio or video CODEC or sampling rate.
RTP may be deployed as a stand-alone protocol or as part of a higher-level protocol. For example, the RTSP (Real Time Streaming Protocol) uses RTP to transmit audio and video packets. However, you control RTSP with text-based commands such as PLAY, STOP and PAUSE (see Table 1). When the RTSP subsystem receives one of these commands, it updates its internal state and, if necessary, changes the type of RTP packets it is transmitting.
It's All in the Packaging
Last month we examined how you can play popular file formats such as QuickTime or .WAV with JMF. Unfortunately, these programs struggle with Internet content because they require that the entire file be downloaded before commencing playback. Not only does this ruin interactivity, but the gargantuan file sizes make this approach impractical. Therefore, streaming media content is specifically optimized to enable immediate playback and is not bound by file-size restrictions.
Since there's no file format, RTP streams can be identified by individual packet headers. For instance, RTP packets contain media type and frame size headers. The media type header indicates the audio or video CODEC that's being transmitted (see Table 2). RTP supports numerous audio CODECs, but most audio packets contain PCM (Pulse Code Modulation) or a PCM variant such as Mu-law or A-law. Similarly, you can stream a variety of video CODECs with RTP, but most JMF applications use H.261.
The frame size header represents logical subdivisions of data, and each packet contains a whole number of frames. Larger frame sizes take longer to capture and transmit, and therefore may cause audio gaps as your application is waiting for data. By contrast, extremely small frame sizes can overwhelm your application with tiny packets and can be difficult for audio CODECs to decode. Consequently, if you're sending RTP packets, you should choose a frame size that balances interactivity and processing power.
Take Control of the Situation
RTP streams are grouped into entities called sessions. A session contains two or more devices exchanging multimedia content. Since RTP is built on an unreliable protocol, there's no method to determine whether a session's RTP packets are being transported successfully. Thus RTP is usually combined with RTCP (Real Time Control Protocol), which lets you monitor network traffic and track the session's participants.
The Real Thing
Now that we've discovered how multimedia content is transported over IP, it's time to examine the JMF streaming APIs. Since the RTSP programming model is similar to the JMF devices we discussed last month, we'll examine it first.
RealNetworks uses RTSP as core protocol for its RealPlayer and is a strong advocate for RTSP in various Internet task forces. In an effort to make RTSP and RealMedia content (e.g., .ra, .ram) more pervasive, they've provided a JMF-based RTSP runtime and SDK.
The first thing you'll notice about RealNetworks RTSP support is how smoothly it fits into the JMF architecture. In fact, the applet we created last month can play RealMedia content without modification. Although it's possible to use vanilla JMF APIs with RTSP, the RealNetworks Player provides exciting enhancements that you'll want to exploit in your applications.
To access this new functionality, you ask the Manager to create a Player that uses RealMedia content. Then you detect the presence of extra features by seeing if the Manager returned a Player, which is an instance of com.real.media.RMPlayer. The following example illustrates how you can detect the presence of an enhanced RealNetwork Player.
If it's a com.real.media.RMPlayer, you have access to pause functionality and automated media position information.
// create a Player to preview the selected file
player = Manager.createPlayer(mrl);
// check to see if this is RealAudio content....
if ( player instanceof com.real.media.RMPlayer)
// if it is, we can take advantage of this
rmPlayer = ( com.real.media.RMPlayer ) player;
JMF Players and Controllers do not surface a pause() method because they were designed to play content on a local machine. When you call the Player's stop() method, it flushes (or removes) buffers from its associated device and enters stopped state. When playback is restarted, these buffers can be refilled rapidly and streamed to the device.
By contrast, it can take several seconds to prefetch a Player that is streaming remote content. Thus you can't flush these buffers when a Player is stopped and still have a responsive device. For these scenarios the RMPlayer surfaces the pausePlay() method.
pausePlay() pauses playback, but it doesn't flush buffers from the RMPlayer's audio/visual device (see the code below). This ensures that playback can resume instantly when the user hits the play button. An additional benefit of this approach is that playback can restart exactly where it was originally paused. Resumption after a stop() is less exact because the Player can detect only the approximate location where it stopped.
// pause rather than stop -- no data loss
Like other Controller methods, pausePlay() is asynchronous. When the RMPlayer finishes pausing, it notifies your listener method with a RMPauseEvent (see code below). Never assume that the pause has completed until this event is received.
else if (event instanceof RMPauseEvent)
// insert code to handle RMPauseEvent here
Besides pause capabilities, the RMPlayer provides automated stream position information via the RMOnPosChangedEvent. The RMPlayer sends an RMOnPosChangedEvent every 100 milliseconds and you can use the event's getPositionInNanos() method to uncover the active media time (see code below). This approach is superior to polling since you don't waste processing cycles trying to guess the current media time.
else if (event instanceof RMOnPosChangedEvent)
// insert code to handle RMOnPosChangedEvent here
Unlike RealNetworks RTSP solution, Sun has designed a hybrid RTP architecture. Part of the solution is integrated with JMF, while the other portion resides outside JMF. The core element of this hybrid approach is the RTPSessionManager, which supports traditional JMF objects such as DataSources and MediaHandlers. However, it adds additional RTP-specific features such as performance metrics and media type detection.
The RTPSessionManager is a supervisory object that manages all RTP sessions you wish to control with JMF. It enables playback of RTP content via the self-contained DataSource and MediaHandler (see Figure 2).
The DataSource provided by the RTPSessionManager is called RTPSocket. Like a conventional DataSource, RTPSocket streams packetized RTP content to client MediaHandlers. However, RTPSocket has one unique attribute: it can also output data. For example, you can use it to transmit RTCP data or session statistics (see Figure 3).
When you request that the JMF Manager construct a Player to handle RTP content, it searches for a MediaHandler that is compatible with the RTPSocket. Normally, the only such MediaHandler is the RTPSessionManager. This occurs because the RTPSessionManager is a special type of MediaHandler called a MediaProxy.
A MediaProxy manipulates content and forwards the modified content to a subsequent MediaHandler. Thus they're usually pure software entities, not associated with a particular hardware device. The portion of the RTPSessionManager that implements the MediaProxy interface translates (or depacketizes) RTP packets into flat data buffers that other Players or MediaHandlers can process. The JMF Manager then searches for a MediaHandler that can accept the output of the depacketizer. If a compatible MediaHandler can be found, the stream can be played.
MediaLocators to the Rescue
The process of constructing an RTP-enabled Player is dramatically more complex than creating a simple audio or video player. Fortunately, the Manager hides this complexity by letting you construct a RTPSocket with a MediaLocator (see code below). The MediaLocator must be in the following format:
rtp://IPaddress:port/mediatype where the IPaddress:port combination is the address of the RTP session, and mediatype represents the type of content in the session (i.e., audio or video).
if ((mrl = new MediaLocator(rtpServer)) == null)
System.err.println("Can't build RTP MRL for " + rtpServer);
After the Manager gives you a reference to the RTPSocket object, you should pass the reference to the Manager's createPlayer() method, as follows:
// Create a Player with the rtp DataSource....
rtpPlayer = Manager.createPlayer(rtpsource);
Because RTP sessions are sensitive to network traffic, it's crucial that you monitor session status. RTPSocket provides access to this information by exposing an RTPControl object. If you call RTPSocket's getControl() method with the appropriate string, it will return a RTPControl object as follows:
// we'll use this name to query the rtp player for an
// rtp control.
private static final String rtpControlName = "javax.media.rtp.RTPControl";
// get the RTP control from the DataSource
rtpcontrol = (RTPControl) rtpsource.getControl(rtpControlName);
if (rtpcontrol == null)
System.out.println("No RTPControl interface.");
RTPControl is a cornucopia of information. There are methods to retrieve the number of packets lost or received out of sequence (see code below). Furthermore, you can query global information such as the total number of bytes processed or the number of invalid packets detected.
// get the statistics info from the rtp control object
RTPReceptionStats stats = rtpcontrol.getReceptionStats();
String packetslost = "\tPackets lost: " + stats.getPDUlost() + "\n";
String outoforder = "\tPackets out of order: " + stats.getPDUMisOrd() + "\n";
String packetsreceived = "\tPackets received: " + stats.getPDUProcessed() + "\n";
Peeking Under the Hood
If you want minute control over how your RTP session is created, you can't rely on the Manager to create the RTPSessionManager for you. Rather, you must perform all the steps yourself. This includes allocating a new RTPSessionManager, attaching yourself as a listener to the RTPSessionManager and initializing the session parameters. Then you must start the session, create a DataSource for it and construct a Player for the DataSource. Unless you're a control freak, or absolutely must control low-level session attributes such as RTP description lists, it's easier to let the JMF Manager handle these details for you.
Until Next Time
RTP and RTSP are protocols that specifically address the transportation of multimedia content over IP. RealNetworks RTSP Player gives you access to RealMedia content and requires no modifications to your JMF programs. RealNetworks also provides custom pause and position change information necessary for streaming applications.
Unlike RealNetworks' pure JMF solution, Sun's RTP architecture is a JMF hybrid. It's possible to write RTP applications using only JMF. However, if you want detailed control over the RTP session, you'll need to manipulate the RTPSessionManager outside the realm of JMF.
Next month we'll unveil the truth about JMF 2.0 - the most anticipated Java multimedia product ever released.
Linden deCarmo is a senior software engineer at NetSpeak Corporation, where he develops advanced telephony software for IP networks. Linden is also the author of Core Java Media Framework, published by Prentice Hall.