There's a project team out there that really hates me. I was called in recently to help them get their application together so it could be put into production. When I got there, I determined that the problem was simple - no one understood configuration management.
It's tempting to say that in the old days configuration management was easy. You just created your executable and distributed it to your clients, and away you went. It's tempting, but that's a load of baloney. Configuration management was an issue on the mainframe, it was an issue on client/server, and it's still an issue today. Any application of any substance is created by a team of developers, and includes configuration files, multiple-source code files, and a host of other options, all of which contribute to the complexity of software development.
What has changed is that the complexities of the systems we're building have increased while the skill of the average programmer, by definition, has remained constant. The number of paradigm shifts that have occurred in the past decade has made it difficult for the average programmer to even understand where code will be deployed, let alone how.
I'm not speaking about small, one- or two-developer projects, although even there the scope of the configuration process can get away from you. I'm concerned more about the large-scale, multimillion-dollar development efforts aimed at putting a high-volume transactional site up on the Web. With the rise of clustering solutions, Network Attached Storage, redundant hardware, and distributed software, it's become much more challenging to deploy an application or even to debug a problem.
Let's consider a simple example. Suppose we're building a transaction system based on Enterprise JavaBeans running in a clustered server. Now suppose we fix a flaw in the transaction bean, one in which a calculation was incorrect and was providing the wrong numbers. We don't change any interfaces; we just correct the logic. Then we deploy it, but by mistake we deploy it to only three of the four servers in the cluster. The fourth retains the old copy. Assuming round-robin scheduling and equal load, one out of every four invocations of the transaction will produce incorrect results. But we know we've fixed it - see, it works here on my machine.
That's the problem. "It works on my machine" isn't an acceptable answer. If you hear that on your project, chances are you're in trouble. It means that someone in your organization doesn't understand the complexities of the environment or the overall principles of configuration management.
For the most part, configuration management is an organizational task. It starts with source-code control. All developers should check in code every day. I was on a project once where a programmer lost six weeks of work because he hadn't checked it in and his disk crashed. That's a lot of recoding.
Proper identification of release materials is next. It's not enough to take the latest checked-in code (what if the developer is in the middle of a rewrite and the code doesn't work?). I've seen some projects try to get around this by having the developers provide the output, that is, the bytecode, instead of the source. This is also bad news. The team I worked with recently assured me that all their code was up to date, and that they were providing the output (in this case EJB jars) because it was more efficient. I charged them a dollar for each missing file, or for each class that wouldn't compile. I made 50 bucks the first day on the compilation issues alone. Now you see why they hate me.
What's also important is to have a clear idea of what each environment looks like and map the distribution to it. Larger projects usually have several environments; there's one for development, one for testing, one for staging, and one for production. They don't always look the same. Sometimes they might not even have the same operating systems. It's not uncommon to see development take place on an NT server, but production takes places on a UNIX box. Java's portable, but one OS is case sensitive and the other isn't. It's important to understand the environment and to have a clear plan for deployment.
Hand in hand with the need for process is the need for automation. It's not enough to have a source-code repository. It's nice that it's got bug tracking now (most of the ones I've used lately, anyway), but we could do that in a spreadsheet. What most of the tools are lacking is a powerful tool that will make deploying code in sophisticated, disparate environments a simple, straightforward task. And the tools that do exist are in the wrong place. WebGain Studio has a great capability to deploy code into WebLogic. IBM's VisualAge does something similar for WebSphere.
That's nice, but for a number of reasons I don't want developers deploying code; it's not their job. It's the task of the configuration manager. That's why the proper place for deployment tools is in the repository. The repository is the basic tool of the configuration manager, and he or she needs that tool to be as robust as possible. It needs to go beyond source code management and defect tracking. These tools need to understand Java, EJB, JDBC, and a host of other technologies. We've needed them for a long time, but they're just starting to come on the market.
It's all about configuration management these days. Do it right and no one will know how good you are. Do it wrong and everyone will know. It's a dirty job, but someone's got to do it. Now go check in that file or you'll owe me a dollar.
Sean Rhody is editor-in-chief of Java Developer's Journal. He is also a respected industry expert and a consuitant with a leading Internet service company.
He can be reached at: [email protected]