MIDI & Audio Sequencing with Java
The Java Sound API, first introduced in J2SE 1.3, includes the package javax.sound.midi, which contains everything you need to be able to send and receive messages to and from any MIDI device visible to your operating system.
The Java Sound Programmer Guide and the Java Sound
Demo, both available for download from Sun, are excellent
references that illustrate all the "nuts and bolts" of sending
and receiving messages. This article provides a brief overview
of working with the MIDI and sampled audio primitives of the
Java Sound API, and then explores using those primitives to
construct a basic multi-track MIDI/audio sequencer in Java.
Programming with MIDI
Basically, every message you send to or receive from a
MIDI device is one or more bytes. The first byte is referred to
as the "message" or "status" byte. This is essentially the "command"
(e.g., note on, note off, change patch, set volume, etc.).
The specific command you are sending or receiving will determine
what the next byte or bytes are, if any. Having an online
MIDI specification reference available will be indispensable.
The first step is to obtain a reference to the specific
MidiDevice you want to talk to. The javax.sound.midi.
MidiSystem is your gateway to everything that Java Sound
was able to detect was installed in your operating system.
You might display a list of available devices to users and let
them choose which device they want to use (see Listing 1).
Once you have a reference to your MidiDevice, you're
ready to begin sending and receiving messages. The first step
is to understand that there are Receivers and Transmitters. As
you might suspect, Receivers receive MIDI messages and
Transmitters transmit or are the source for MIDI messages.
Step one is finding out which MidiDevices the MidiSystem
is reporting. The easiest way is to iterate through each one, try
to play "middle C," and see what happens. You may find that
some of the MidiDevices are configured as receive only, others
are transmit only, and others might receive and transmit.
In my setup, my MIDI interface reports two different
MidiDevices for the same keyboard - one that is transmit
only, the other that's receive only. So when I want to send a
MIDI message, I send it to the MidiDevice that is receive only,
and when I want to record MIDI messages that I trigger by
playing the keyboard, I do it by listening to the MidiDevice
that's configured to transmit only (see Listing 2).
Step two is to figure out which of the MidiDevices are
transmitters. In this case, you'll need to create an implementation
of the javax.sound.midi.Receiver interface (see Listing 3).
Next, figure out which of the available MidiDevices are
acting as your keyboard's transmitter by trying them out one
at a time. Obtain each MidiDevice's "transmitter," assign your
receiver to it, play a few notes on your keyboard, and look for
output in the console that indicates you've found the right
"transmitter" device (see Listing 4).
Once you've figured out which MidiDevice is your receiver
and which is your transmitter, and you know the basics of
sending and receiving MidiMessages, you now have everything
you need to create your own full-featured 16-track
MIDI sequencer! Well, not quite...
Creating Your Own MIDI Sequencer
As I soon found out, there's a lot more to creating a
sequencer than just knowing how to send and receive MIDI
messages.
You may have noticed by now that the javax.sound.midi
package already includes a class Sequencer. Unfortunately,
this "built- in" sequencer is limited in its capabilities and is
not extendable since it's an interface. But the biggest reason I
was unable to use it is that it seems to be "hard wired" to use
the internal Java synthesizer (Sun Bug ID 4783745). It also
appears to have very bad timing problems (Sun Bug ID
4773012).
With the API it's easy to create your own sequencer. First
we have to be able to record a single track and play it back in
the exact same timing it was originally played in. Moreover,
the performing artist (that's you) will want to have a four-bar
metronome count off prior to recording start, then, to keep
perfect time, you'll need to continue the metronome until
the user clicks stop. Of course, the metronome sounds
should not be part of the performance when it's played back.
You can have the computer emit a "system beep" for your
metronome, but I prefer to listen to the hi-hat of a drumkit
on the keyboard. A lot of sequencers use MIDI channel 10
(i.e., track 10) as a drumkit, but you can choose any one you
like. A tempo of 120 beats per minute (bpm) means 2
beats/second or 1 tick of your metronome every 500ms.
As you may have guessed, you'll need one thread playing
the ticks of your metronome (on Channel 10) while you
record any MidiMessages you receive (via your Receiver
implementation) on Channel 1. (Note that I am referring to
the channels/tracks from the musician's perspective. The
channel references in the API are zero-based.) Listing 5
shows what your metronome thread might look like.
Playing the metronome is simple enough, but you need to
figure out a way to play it through just one time, then begin
recording MidiMessages as they arrive at your receiver (discussed
later). How you do that will be left for you to decide.
Recording MIDI
How exactly do you record MidiMessages? There are basically
two strategies: you can try to take note of what time each
message arrives, or you can use the included timestamp of
each message. In either strategy, your implementation of the
Receiver interface will create an ArrayList and add each
MidiMessage it receives to the ArrayList. Of course, you'll
need to make sure you record only MidiMessages for the
duration immediately following the four-bar Metronome
count off until the user clicks stop.
Your first strategy might be to use System.currentTimeMillis() to take note of the current system time (in ms) at
which each MidiMessage arrives. You'll need to know this
when you play back these messages. The general idea is to
play back the messages using a thread, that's sleeping
between messages, according to the relative time they originally
arrived. In my experience, the system clock was not reliable
enough to deliver rock-solid timings during playback.
You'll know what I mean if you try this strategy when you listen
to the playback of messages based on the system clock.
The other strategy is to use the embedded timestamp that
accompanies each MidiMessage. This timestamp is expressed
in microseconds based on the time you first opened the
MidiDevice. Unfortunately, by the time the four-bar
metronome count off ends, it's difficult to say when the first
message should be played back. That is you can't assume that
the first message that arrives should be played back at time
zero. Perhaps the musician's first note is played halfway
through the first measure. Since the MidiDevice was opened
long before your metronome began playing, it's difficult to
determine from the timestamp alone how much time your
playback thread should wait until it sends the very first message.
Of course, all messages after that are easy, since you can
just calculate the time to wait in between each message based
on the relative differences of the message's timestamps.
The best solution I came up with was to just take note (by
way of System.currentTimeMillis()) of when recording actually
begins (that is, after the four-bar metronome count off), and
then take note of when the first MidiMessage arrives. Then,
during playback, the playback thread merely needs to wait the
calculated delay time before playing back the first message.
Thereafter, it can simply use the relative differences between
the MidiMessage timestamps for all subsequent messages.
It may surprise you to learn that what you think of as a
chord (or several chords across multiple tracks) struck
simultaneously is actually played back one note at a time,
sent serially as a stream of MidiMessages, one at a time. You
have to remember that the playback loop playing back the
messages is so fast that the human ear will not be able to
discern the difference between the original "three notes
struck simultaneously" and "three notes played 1 ms apart."
You should now be able able to record and play back a
single MIDI track at 120 bpm. If, when it plays back, it
sounds just like you played it, you're halfway there. The next
step is to be able record additional MIDI tracks while playing
back previously recorded tracks.
Recording Multiple Tracks
You may have already begun to notice that, although you
are receiving and recording the MIDI messages, it's hard to
control what sound/voice/patch the keyboard is playing.
This is why each of the 16 MIDI channels on the keyboard
can have a different patch associated with it. Most keyboards
allow you to change what MIDI channel they are transmitting
on. Whatever MIDI channel you have selected on the
keyboard should also change the patch selected as well.
The problem is that you don't want to constantly have to
make sure your keyboard's selected channel matches the
track you play to record in your sequencer. If they're not in
sync, you'll think you're recording track/channel 2, but the
keyboard still has channel 1 selected. Although you may have
the "channel 2 ArrayList" full of the MidiMessages you
received, those messages have one of their bytes indicating
that they are channel 1 messages, and so playback of those
"channel 2 messages" results in playback on channel 1, playing
channel 1's patch instead of channel 2.
The solution seems tricky and not very efficient, but it
seems to work just fine. The trick is to first turn off the keyboard's
"keyboard" from triggering sounds internally; it will
continue to transmit MIDI messages as usual:
ShortMessage msg = new ShortMessage();
msg.setMessage(ShortMessage.CONTROL_CHANGE, 122, 0);
_receiver.send(msg, -1);
Next, "route" all incoming MIDI messages to the keyboard,
playing them back on the track the user thinks he is
recording. For example, you may receive all your MIDI messages
with the "channel 1 byte" set. If the user thinks she is
recording track 2, then for each MIDI message received, in
addition to recording it (by storing it in track 2's message
ArrayList), change the "channel byte" to 2 and retransmit
them back to the keyboard (see Listing 6).
Playing Back Multiple Tracks
Assuming you have several different tracks of MIDI data
recorded, it's time to play them back. Your first approach
might be to use a separate thread for each track (channel).
While this is an intuitive programming model, you'll quickly
find that although each track (thread) plays back in perfect
time relative to itself, it's difficult to keep it perfectly in sync
with the other tracks. If your tracks are short and you plan to
loop them, you could use thread synchronization to make
sure all tracks "sync up" with each other at the end of each
iteration. However, you will soon find your clean sequencer
code is getting cluttered up with complex thread synchronization
all over the place, and it becomes harder and harder
to manage and still achieve "rock solid" timing.
What I found to be easier to manage and virtually guaranteed
to stay "in time" was to collect all MidiMessages, regardless
of track (channel), put them into a single ArrayList, sort
them all based on their timestamp, and then play them all
back using a single playback thread.
Adding Digital Audio
By now you should have a good instrumental recorded
using multiple MIDI tracks, but you'll add more interest to
your song by laying down a vocal track or two. Luckily, the
Java Sound API includes the javax.sound.sampled package
dedicated to recording and playing back digital audio.
Recording Audio
Ultimately, any recorded digital audio comes down to samples.
A sample is a measurement at a point in time of what you
might picture as the audio "waveform." The standard CD sampling
rate is to take 44,100 measurements, or samples, each
second. Each sample may be 8 bits, 16 bits, or more. There are
a variety of sample formats in use today, and the Java Sound
API supports about everything you'll encounter. Some useful
constants for recording CD quality sound are:
AudioFormat.Encoding encoding = AudioFormat.Encoding.PCM_SIGNED;
int rate = 44100;
int sampleSize = 16;
int channels = 1;
boolean bigEndian = true;
An AudioFormat object will be needed later:
AudioFormat format = new AudioFormat(
encoding, rate, sampleSize, channels,
(sampleSize / 8) * channels, rate, bigEndian);
Before you can begin recording, however, you'll need to
obtain a TargetDataLine. The Java Sound API models its sampling
API in terms of "lines." A line may be a microphone
input, a previously recorded sample, the computer's "line
out" or speaker, or any type of "input" or "output." To facilitate
the playback of multiple samples at the same time, the
interface Mixer is provided, which is itself a type of line.
Lines may have controls that parallel what you'd find in a real
mixer - gain, pan, volume, reverb, equalization, etc.
Like the MidiDevices returned from the MidiSystem, the
class AudioSystem serves as your gateway into finding out and
obtaining whatever Lines and Controls are installed and available
to you. In general, the first step to recording an audio track
is to obtain a TargetDataLine suitable for recording audio in the
format requested, in this case an AudioFormat that is a single
16-bit channel recording 44,100 samples/second (see Listing 7).
As you may have suspected, you'll need a separate thread
to capture the incoming sample data. Using the TargetDataLine and OutputStream created previously, you'll want to
create a loop that reads a chunk of bytes at a time from the
TargetDataLine, writing them out to the OutputStream until
there's nothing left to read or until the user clicks stop (see
Listing 8).
At this point, your ByteArrayOutputStream contains a ton
of bytes. The average 3:30 minute song will require 9.3MB
worth of samples for just a single mono track! FileOutputStream might be a better choice if you're going to be recording
lengthy samples and memory becomes scarce. Of course,
recording the sample is just half of the story. Now we have to
play it back.
Playing Back Audio
Playing back a previously recorded audio track is essentially
the reverse of recording it. That is, the sample's bytes,
originally stored in an OutputStream, are written out to a
SourceDataLine one chunk at a time until there's nothing left
or until the user clicks stop.
To read the bytes a chunk at a time, we'll need an
InputStream. The Java Sound API provides the class
AudioInputStream that has several convenience methods for
working with samples. Again, we'll need to refer to the same
AudioFormat that the sample was originally recorded in. In
our case, we'll assume we're dealing with a completely inmemory
sample, expressed as an array of bytes (see Listing 9).
Note that AudioInputStream's mark method is used to
mark the beginning of the sample, while the reset method is
used to "rewind" the sample to the beginning.
As has been the case, we'll need a separate thread to play
back the sample. We'll use the AudioInputStream set up
above to read sample bytes from it, a chunk at a time, writing
them out to a SourceDataLine. Just as we obtained our
TargetDataLine from the AudioSystem, we'll obtain a
SourceDataLine suitable for playing back a sample in our
AudioFormat through inquiry (see Listing 10).
Since we have a SourceDataLine that can handle our
AudioFormat, we can start a thread to write out the sample
bytes to it (see Listing 11).
Now that you have your audio track playing back - we're almost done!
Putting It All Together
At this point we have the main ingredients for a basic
multi-track MIDI sequencer that can also record and play
back audio. Although we can play back multiple tracks of
MIDI using just one thread, it's much more difficult to play
back multiple samples with a single thread. For simplicity,
we'll continue to use one thread for all MIDI data, but create
a different thread for each audio sample.
The basic trick for integrating MIDI and one or more
samples is to simply synchronize the start of the MIDI tracks
thread with the audio track thread(s) using normal thread
sychronization techniques.
Of course, real commercial MIDI/audio sequencers can
do much more than record and play back multiple tracks.
That's just the beginning. After all, a real sequencer can:
Play back what was recorded at one tempo at a different
tempo
Import "instrument definitions" that specify the patch
names mapped to patch numbers
Select each track's "patch" by searching the available
patches by name
Provide a mixer with volume and pan sliders for each track
Record and play back volume changes from the mixer in
real time
"Trigger" audio samples from the keyboard (a la a conventional
sampler)
Quantize recorded MIDI data to the nearest 1/4 note,
1/8th note, 1/16th note, etc.
I'm out of space, so for now, I'll have to leave that as an
exercise for you, the reader. In the meantime, enjoy your new sequencer!
References
Open source MIDI and audio projects: Audio Development
System: http://sourceforge.net/projects/adsystem
jMusic: http://sourceforge.net/projects/jmusic
Sound Grid: http://sourceforge.net/projects/soundgrid
API References
Java Sound Programmer Guide:
http://java.sun.com/j2se/1.4.1/docs/guide/
sound/programmer_guide/contents.html
Java Sound Demo: http://java.sun.com/products/javamedia/
sound/samples/JavaSoundDemo/
MIDI Specification
Official MIDI Specification: www.midi.org
Online MIDI Specification (unofficial):
www.borg.com/~jglatt/tech/midispec.htm
Miscellaneous
Bug ID 4773012: RFE: Implement a new stand-alone
sequencer: http://developer.java.sun.com/developer/bugParade/
bugs/4773012.html
Bug ID 4783745: Sequencer cannot access external MIDI
devices: http://developer.java.sun.com/developer/bugParade/
bugs/4783745.html
Author Bio
Mike Gorman is a senior software architect for J.D. Edwards, a PeopleSoft company, concentrating on J2EE distributed transaction systems. Mike has been coding in Java since
1997. In his spare time, Mike plays with MIDI, Swing, Web services, and JDO.
mike_gorman@jdedwards.com