There are many facets to consider when implementing even the most basic
software configuration management (SCM). For Java, with its import
mechanism, these simple goals often become unmanageable when the source code
tree grows beyond a certain point of complexity.
This is mainly due to the reticulate interdependencies that arise within
the source code tree as it evolves. Also, because code is seldom (if ever)
retired, the code base continues to grow, causing this network to become
increasingly complicated over time.
In this article I explore the evolution of the typical Java source code
tree and the underlying relationships that make even basic Java SCM
problematic. I also suggest a simple way to manage source code relationships
to meet basic SCM goals.
Understanding these topics will enable Java development shops to begin
implementing simple yet effective SCM systems that balance the requisite
process with unencumbered development, testing, and operational deployment.
By requisite process I mean staying a couple of steps ahead of SCM-related
firefighting while remaining free from laborious and/or unnecessary
processes.
Some Simple Goals of Java SCM
- Maintaining source code under revision control
- Managing code dependencies and third-party library dependencies
- Managing builds and build dependencies
- Managing dependencies on third-party JARs
Beyond a certain range of complexity (usually a few hundred total source
files, depending on the skill of your developers and how quickly they're
being asked to churn out code) the reticulate interdependencies within the
code are unable to be unwrapped. That is, the large number of
interdependencies introduced by import statements causes artificial
dependencies when trying to add features and build, branch, release, and
test your code.
More specifically:
Building a subtree causes the compilation of every source code file in
your source tree due to circular dependencies. This results in extremely
lengthy build times for some projects I've seen.
No source code is free to move along under its own development cycle
you might need to build a subbranch n times per day and another only m times
per month, but, because they have import interdependencies, they're both
built at the maximum (required) rate.
Branching and merging are extremely time-consuming and complex and can
introduce significant developer downtime, mostly due to the large number of
source files that must be considered.
Releasing code to operations is very difficult, as you have to push
every Java class file upon release.
Testing is more difficult, if not impossible, since it's harder to
isolate subbranches of code to understand their functionality. It's also
more difficult to write a testing harness for a subbranch (e.g., using
JUnit).
Most current source code management tools deal with navigating source
hierarchies and finding objects and methods. These are great problems to
solve, but not ones that we're primarily interested in (JavaDeps comes
relatively close in that it helps to discover some compilation dependencies
that go unnoticed by some compilers).
Similarly, many revision control systems (RCS) provide check-in,
checkout, branch, and merge capabilities, but none address source code tree
structure and how to manage the requisite dependencies involved.
Target Audience
The target audience is developers, testers, and operational support
staff who are interested in taking the necessary steps to actively manage
their Java-based projects in terms of building source code for test and
operational deployment; developing multiple versions of a product or service
in parallel; and replicating operational, test, and development environments
to reproduce unexpected behavior and fix bugs.
Large numbers of Java source files are in the range of 500..10K with
large numbers of dependent third-party JARs in the range of 50..1K. All
told, we're talking about a set of development projects that have 0(50L)
total document and code artifacts...not very big, but large enough that it's
worth examining how the code base evolves and how to keep it from turning
into a liability instead of the asset it's intended to be.
Due to its complex nature, this topic is too large to be covered in a
single article. I'll start by covering the basics of source code management
and builds, and finish by touching on the topics of managing deployments and
documentation. Future areas for discussion include managing properties files
and build tools and building WAR files.
The Evolution of Java Source Code Hierarchies (aka Back to Basics)
Every Java shop I've ever worked in has followed an eerily similar
evolutionary path as far as its Java source code is concerned:
1. Starts the root branch off by creating Java package com.mycompany
2. Begins to populate the source tree with a layer of utility and/or base
classes, many of them the usual suspects like com.mycompany.db,
com.mycompany.utils, com.mycompany. regexp, and com.mycompany.xml
3. Continues to populate this source tree with a set of servlets, beans,
data access, and JSPs that depend on the set of common classes (the
aforementioned usual suspects)
This approach is extremely intuitive and works for a while for about
as long as the codebase remains simple enough that dependencies between
distinct packages are well understood.
The first dependencies introduced are usually servlets, beans, and JSPs
importing common/shared utility classes. These dependencies are distinct,
simple, and well understood. However, soon thereafter, more complex
references are introduced as developers try to reuse as much code as
possible while minimizing the time they spend repackaging code. A servlet
from one package begins to look like a utility to another package and is
subsequently imported. This type of import can create a circular reference
(see Figure 1) between source code files and sets the stage to make even
simple SCM prohibitively complex.
Figure 1
Introducing circular references in Java is surprisingly easy and
extremely common, though, interestingly enough, I've never actually heard of
a developer admitting to such a practice. Understanding these relationships,
plus your source code's dependencies on third-party JAR files, is key to
having a modular, branchable, buildable, testable, and deployable codebase.
Partitioning Source Code: The Introduction of Components
The first step in decoupling direct source code dependencies is to
partition your source code into components.
A component is a set of Java packages that provides a specific set of
functionality and has its own development cycle. It doesn't matter whether
it's one package or 20, one source file or 200 source files (though using
more than a few hundred source files in one component will bring you right
back where you started, in terms of problematic source code management).
Having their own development cycle means that, relative to the other
components, the source files need to be built/tested/deployed n times a day
while other source code needs to go through this cycle m times a day.
Partitioning source code into components will become fairly intuitive
after a few examples:
Example 1
Utils make great components because they are shared by so many other
source files and therefore are dependent on a lot of files. This also causes
them to have a quicker dev cycle (and therefore a quicker build turnaround)
than most other source code. Create one component for all your utils, or
partition them further into multiple components (see Tables 1 and 2).
Table 1 |
Table 2
|
Example 2
Database access classes can be grouped into separate components. For
multiple database servers, use multiple data access components, tying each
schema to a component 1:1. This handles schema changes nicely and helps
manage a component's dependencies on multiple database servers (see Table
3).
Table 3
Example 3
A set of JSPs or servlets that provides a specific set of functionality
should be a separate component. This could be a data-entry application, a
data-feed reader, or an administrative UI for one of your internal systems.
Because these types of components have their own requirements and delivery
dates and the requirements change, they end up on their own development
schedule, so it makes sense to create a component here (see Table 4).
Table 4
You could end up with as many as several hundred components, each with
anywhere between 10 and perhaps 350 source files. Although partitioning your
source code looks complicated, it's actually easy (the difficult part is
getting your builds started).
All this source code needs to be checked into a revision control system
(RCS). Any/all RCS syntax in this article will be in reference to Perforce
(www.perforce.com), as it is has many features that make SCM very simple.
In Perforce parlance, the component source code is checked into
location:
//depot/components/<component_name>/src
For example:
//depot/components/FileUtils/src/com/mycompany/...
//depot/components/DataParser/src
//depot/components/UserData/src
Builds, branches, and documentation are also partitioned under each
component for RCS:
//depot/components/<component_name>/src
//depot/components/<component_name>/branch
//depot/components/<component_name>/builds
//depot/components/<component_name>/docs
Third-Party JAR and ZIP Files
The other source for build dependencies are between your source code and
JARs provided by a third party.
This necessitates actively managing these files to keep on top of their
multiple versions and frequent name collisions. It's very easy to impede the
progress of debugging and building through the mismanagement of third-party
JARs and ZIPs (e.g., opening up JARs manually to try to find a version
number to find out what you built against, or what version you have in
production), and yet remarkably simple to organize them intuitively and
efficiently.
Because successive versions of third-party JARs sometimes result in name
collisions, it's necessary to use the version numbers to maintain them under
RCS. In Perforce, the JARs might look like the following (using the JDK and
JSDK as examples):
//depot/jars/jdk/1.2.2/rt.jar
//depot/jars/jdk/1.3.0/rt.jar
//depot/jars/jdk/1.3.1/rt.jar
//depot/jars/jsdk/2.0/jsdk.jar
//depot/jars/jsdk/2.1/server.jar
//depot/jars/jsdk/2.1/servlet.jar
//depot/jars/jsdk/2.2/servlet.jar
This versioning scheme allows components that might depend on the 2.1
version of servlet.jar to reside next to components that might depend on the
2.2 version. Both components can be built and deployed in parallel and their
dependencies tracked accordingly.
This approach also has the added bonus of allowing for any client that
has access to your RCS server to be able to run builds, as every server has
access to the requisite JARs via RCS.
Building Components
Now that your source code is partitioned and third-party JARs are under
RCS, it's time to start building. Build requirements are very simple:
A build for one component may only execute against that component's
source code. All other build dependencies must be linked through other
components' builds or third-party JAR/ZIP files. In short, a component build
may not execute against any source code other than its own.
Results of builds (JARs) must be under RCS.
Source code needs to be labeled with the build number, so there is a
link between a build JAR and the source code that produced that JAR/WAR.
This implies that given any JAR for any component, the original set of
source code can be located.
The dependencies for a deployment (a set of JARS that are deployed
together into QA/dev for testing/production) must be under revision control;
i.e., the list of dependent JARs for a build of a component must be under
revision control.
This first component built must be entirely self-contained it can be
built using only its own source code and (optionally) third-party JAR files.
Components built this way are seed builds and start your build process.
Build each of these components one at a time by compiling their Java source,
JARing up the resultant class files, and checking these JARs into your RCS
(build scripts should do all this for you).
If you can't isolate a component so that it's entirely self-contained,
either repackage your source code (not often done due to time constraints)
or generate an invalid build so you can begin to generate seed builds. (An
invalid build is when a component is built against its own source code plus
the source code of another component. Sometimes it's impossible to isolate
even one component so it's self-contained, so you'll need to build it
against multiple components' source code to get started. After this initial
build, you'll be able to build it against its own source code and JARs
created from this first build.)
A high-level overview of a build involves the following steps:
1. Sync up source code and third-party JARs from your RCS to your local
machine.
2. Make sure your target build number hasn't been built already.
3. Set up your CLASSPATH, which contains three sets of entries:
The path to the root of the component's source
The paths to other JAR files from other components
The paths to requisite versions of third-party JAR files
4. Execute make to build your source.
5. JAR up the resultant class files and check this JAR into RCS.
6. Generate a build summary file (containing the environment, date, etc.)
and check this into RCS.
7. Generate a label and stamp the source code for the build with it.
Once you have all of your seed builds, begin to build those components
that have only one level of dependence on other source code within your
repository, i.e., they can be built using only their own source code, the
seed JARs, and, optionally, third-party JARs. Build each of these components
individually, JAR up their resultant class files, and check these JARs into
RCS.
Once all your one-level dependence builds are complete, it's open season
to build the rest of your components, usually done in the order of
increasing number of dependencies. The goal is to make sure no component is
compiled against any source code except its own.
What you're effectively doing here is isolating like branches of Java
code in sets of Java packages against changes in other branches of Java code
(also grouped in Java packages). This is probably the most important aspect
of the build strategy. This allows stakeholders of your SCM system to
isolate, and therefore understand, the dependencies between source code and
JARs, and trace any build to the source code that was used to generate that
build as well as replicate an environment by easily reproducing the JARs
used to construct the environment.
Equally important is that the source code for a component is associated
with its build JAR via a label, so it's easy to trace any class file you
have in production back to its source files, and then from there to trace
other dependent components' class files back to their corresponding source
code.
This method of organizing your builds also frees up any components that
share a dependency on a common component (e.g., utils). The common component
is now on its own development cycle, so it can iterate through many build
cycles while allowing dependent components to migrate to newer builds when
it makes the most sense. Said another way, it allows for
independent/parallel development of components that have a dependency on a
single, shared component.
Build Example
At a high level, consider the following scenario for building your first
three components:
1. Component 1 Utils: No dependencies. Build it against its own Java source code to generate a JAR file that contains the resultant .class files.
2. Component 2 DataParser: Depends on a previous build of the Utils component, as well as a third-party JAR called xerces.jar (v1.4.1). Build it against its own source code, a previous Utils component build, and the
xerces.jar file from xerces v1.4.1.
3. Component 3 DataCaptureUI: Depends on a previous build of DataParser, a previous build of Utils, and a third-party JAR called
servlet.jar (v2.0). Build it against its own source code, a previous
DataParser component build, a previous Utils component build, and the
servlet.jar file from jsdk 2.0.
Note that because Component 3 depends on DataParser, and DataParser
depends on xerces.jar, you'll need to add xerces.jar as a dependent JAR for
the DataCaptureUI build.
The above set of builds and dependencies is shown in Figure 2.
Figure 2
Conclusion
Managing source code dependencies is only the tip of the iceberg for
comprehensive SCM. Other facets of SCM that fit into the component model
include:
Managing deployments: A deployment is the set JAR, ZIP, WAR, and properties files that allow the component to operate in its designated
environment (usually dev, test, or operations). Property and config files
can be partitioned similar to source code, whereupon all component artifacts
can be synced directly from your RCS server to their deployment server with
deployment dependencies tested and well understood.
Managing documentation: Component documentation can be bundled with its corresponding component under RCS and mapped to a mount-point on your
intranet server for automated publishing. Documentation management has a
large number of implicit requirements involving availability, content, and
versioning from release to release.
Partitioning Java source code into components and formalizing
dependencies will provide several key benefits for your Java-based projects,
some of which are implicit thus far:
Ability to provide parallel development of projects that share a common
codebase
Ability to easily deploy to development, test, and operational
environments
Ability to minimize the amount of code associated with a
build/deployment
Elimination of confusion and name collisions due to third-party JAR
dependencies
Reproduction of deployment environments to help reproduce problems (and
then eliminate them)
Ability to retire code and branches of code when a component is retired
Author Bio
Tom Laramee is a software developer currently working with the Blindsight
Corporation
writing computer vision software for embedded systems and handheld devices.
He has spent the last five years designing and building Web applications as
both a development lead and system architect. He holds an MS in electrical
and computer engineering from the University of Massachusetts, Amherst.
laramee@pobox.com