HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

As developers are increasingly using Java for advanced applications, they've become dependent on the availability of scalable technologies and tools to support their development, including quality assurance (QA), testing, maintenance, release and customer support requirements. The technologies available today have been inherited largely from those available for languages such as C and C++, including visual IDEs and a host of other tools that offer a solution to a particular problem. A few tools have been tailored specifically for Java and enhance the strengths of the language (like the incremental IDE VisualAge and InstallShield's installer for Java that allows Java applications to be installed onto any Java-compliant platform).

This article discusses specific issues related to large-scale software development in Java, suggests ways to address them and concludes with an overview of the Metamata (derived from meta-automata) toolsuite's answer to some of these problems.

While large-scale software development in Java faces some of the same issues as those of other languages, some are unique to Java. Factors related to C and C++, such as memory leaks, don't exist in Java because of garbage collection. On the other hand, Java introduces different issues such as thread analysis and memory debugging.

Standard IDEs don't address most of the problems that arise during large-scale software development. In fact, they aren't designed to be a complete solution for large-scale development. Hence their functionality must be augmented with specialized tools.

Organization and Maintenance of Software Components
This is one of the big tasks of large-scale software development. The system must be arranged into a set of small, manageable components that interact with each other. The interaction should take place through well-defined, organized interfaces, simplifying the task of managing and maintaining the components.

Typically, time constraints and insufficient experience combine to introduce defects in the way software systems are architected, leading to decreased quality and larger overheads in managing and maintaining the system.

As an answer to this problem, a number of studies have measured software systems for complexity, which has led to a standardized set of software quality metrics. While there's no substitute for experienced project managers, the metrics do offer insight into assessing software complexity and quality.

It's also important to be able to detect inconsistencies in the program as it changes. Typically, a program may be changed in one place, but the effect of these changes in other places is overlooked. For example, by changing a type it's possible to make an existing type cast located in a different module no longer necessary, and also overlook this type cast.

Time Constraints
A problem faced by large software systems development (in any language) is waiting for the system to be rebuilt after every small change. The time required to rebuild after each change increases with the size of the system, adding up to expensive overhead costs. After a certain point, the necessary, endless rebuilds significantly reduce productivity.

Organizing a system into well-architected components and reusable libraries goes a long way toward solving this problem. Yet developer tools still need to be smart about how much rebuilding they have to do for each small change.

The best solution to this problem is incremental development environments such as VisualAge. They recompile only the minimum amount necessary when a system is changed. This concept of incrementality can also be extended to other activities beyond the standard development steps to include QA, testing and so forth.

Memory Management
Large systems tend to use a lot of memory, and unless it's managed carefully the capacity of the underlying hardware can quickly be exhausted. In systems written in C and C++, the developer has complete responsibility for making sure that unused memory is recycled for future use rather than retained indefinitely. Java addresses this problem with automatic garbage collection, i.e., the Java Virtual Machine periodically searches for memory that is no longer in use and recycles it for future use.

Unfortunately, garbage collection can take a significant amount of time when systems use a lot of memory, severely contributing to performance degradation. Most Java programmers today assume that they have to live with this in large Java programs.

The solution is to actively manage memory, and simply let the garbage collector kick in for the smaller chunks of memory as well as what slips through the cracks of the explicit memory management routines. Hopefully, better garbage collection algorithms will become available shortly in Java Virtual Machines and the problems related to garbage collection will soon be a memory; the next six months will reveal this possibility.

Another memory-related problem specific to Java is leaks due to the unrelinquishment of memory that's no longer necessary. In Java the garbage collector can only recycle memory that isn't being retained by the user program. Memory retained erroneously by the user program will never be collected, even if it's not used anymore.

To solve these problems, debuggers and profilers need to provide specialized features for Java. Debuggers should provide capabilities to determine whether a memory leak is occurring, and profilers should provide insight into the details of memory allocation.

Performance
Clearly, performance is the biggest issue for Java programmers. Performance of Java programs, in relation to C and C++ programs, necessarily suffers because of the following reasons:

  • Java is an interpreted language, and by nature integrated languages run slowly. A lot of work is being done to improve performance while remaining in an interpreted environment (e.g., Just In Time [JIT] compilers). However, performance will never catch up with compiled languages.
  • Garbage collection contributes to performance degradation, especially in large programs.
  • Java is a richer and more secure language than C and C++. It offers features such as serialization and reflection that are inherently inefficient although they significantly enrich the language. Also, Java performs checks at runtime, such as bounds checks for every array reference that causes a degradation in performance.
Good profilers are important with Java. Information provided by profilers can help developers modify their programs to run faster. Furthermore, a certain amount of code optimization during the compilation process can also improve performance. For example, field access can be inlined, and certain classes and methods can be made final during final packaging of a system.

Threads
Given that threads are an integral part of the Java language, there has been a significant increase in thread use to solve problems more elegantly than within a sequential framework. However, programming with threads is inherently more complex than sequential programming because there are more ways in which multiple threads of control can interact with each other and as many ways for things to go wrong. Furthermore, it's usually difficult to reproduce a problem caused by the interaction between threads since multithreaded programs are nondeterministic in their execution.

A lot of research has been devoted to understanding the issues of multithreaded systems over the past 20 years. The Java language design offers state-of-the-art features based on this research, which does contribute to simplifying thread-based systems. However, a good language alone is not enough - there's also the need for good debugging and analysis tools to facilitate a better understanding of how a multithreaded system works.

I believe the best way to deal with threads is to have a diagnostic capability in which probes are permanently inserted within the Java program. These probes save information pertaining to program execution, which can then be used later to analyze the program's behavior. This analysis can be performed to ensure that certain properties always hold (e.g., no two threads simultaneously execute a certain portion of code).

Safety
When building large systems a lot of assumptions are made regarding how the system works. If the system is correct, these assumptions are met by the system execution. However, since bugs in software are always expected, these assumptions may not always hold. The debugging process essentially means running the system in a controlled manner to determine if these assumptions are met, and looking for ways to make corrections when they aren't.

In mission-critical systems some assumptions are important and require enforcement. Similarly, in multithreaded systems where it's often impossible to reproduce a problem, the violation of any assumption must be reported.

Providing diagnostic APIs solves this problem, allowing assumptions to be built into the program as constraints that must hold during the program's execution. Tools to help manage them are needed to encourage users to write such constraints. One important capability necessary to encourage using diagnostic constructs is an easy way to strip out these constructs when it's time to package the system for final shipping.

Portability
Compiled Java files can be moved to different platforms and executed using different Java Virtual Machines with no recompilation required. To facilitate portability, the Java language definition has gone to great lengths to specify exactly how a Java program must run. Very little ambiguity remains.

Only a few problems exist in writing portable Java applications. The most important:

  • Thread scheduling can vary from platform to platform. For example, one platform may provide small-time slicing for thread swapping while others may not. This can cause system liveness to differ on various platforms.
  • Use of platform-specific notation - the most obvious example is to refer to a file as a raw string (such as "C:\METAMATA\Test.java"). Clearly the presence of such a string in a program will cause it to perform poorly on a UNIX platform.
  • Bugs in Java compilers and Virtual Machines can cause an otherwise correct program to behave differently in different environments.
  • User of nonstandard APIs: certain APIs are available on only a few platforms. Making your program depend on such APIs will (obviously) cause porting problems.
While these problems are really quite trivial when compared to the issues involved in porting programs written in other languages, the promise of "write once, run everywhere" exacerbates these problems when developers expect their Java program to run smoothly everywhere.

The only real way to solve this problem is to test Java systems on as many platforms as possible. In addition, several heuristics to writing portable Java programs have been developed over the past couple of years. Facilitating portable testing and checking Java programs for certain portability heuristics violations can help developers in writing portable Java programs.

Developing Multiplatform Applications
There's no better approach to facilitate this development than to perform development on multiple platforms. Ideally, individual developers should already work on different platforms. It must also be possible for the same developer to move between platforms.

The message here is that development must be performed using tools that port to multiple platforms. An IDE that runs on only one platform can be severely constraining on a development team building multiplatform software.

Platform Accessibilty
If a Java application is developed, taking care to ensure that it's portable, and then shipped for use by customers on a wide variety of platforms, you can be sure that customers will run the application on platforms you don't have access to. Furthermore, there are bound to be problems reported by these customers. Special care needs to be taken to ensure that it's possible to support them.

One thing to do is to ask the customer to run a general probing tool that provides full information on the customer's Java environment. It's also useful to have a version of the software that's heavily instrumented and then ask the customer to attempt to reproduce the problem using this instrumented version. Usually, it should simply be the shipped application running with a special environment setting. Then it's possible to study why the application runs differently on the customer's machine.

Build Reusable Libraries
Java encourages better organization and maintenance by making it much easier (compared to other languages) to build reusable libraries of software components. Widespread reuse of third-party components is common and developers tend to build ones that are as general and reusable as possible. This leads to a software bloat, which occurs when the system contains a large amount of useful functionality that's never used by the system itself but exists for possible future use. In many cases it's difficult to identify the "system" from a set of reusable libraries.

As a result, it's necessaray to trim these libraries down to only the essential pieces of code to enable systems to be packaged for release.

Obfuscation
Compiled Java code is rather high-level. Hence it's possible for someone to (illegally) reverse-engineer compiled applications. Therefore, care needs to be taken to properly obfuscate the compiled code so that it still runs in the same manner but looks different. There are a variety of ways to obfuscate Java code, from schemes as simple as changing the names of variables to sophisticated schemes where the compiled Java code is encrypted. However, regardless of the scheme used, it must be possible to interpret error messages, stack traces, etc., for customer support purposes.

Conclusion
Java has been adopted rapidly by both industry and academia, and software developed in Java is growing in complexity. The challenge to Java-tool developers so far has been simply to keep up with the pace of growth. Now they face a greater challenge of building tools designed to solve specific issues related to software development in Java, rather than simply to retrofit C and C++ technology for Java. Over the next few months we should see many new and exciting tools to solve problems related to, for example, garbage collection, performance and portability.

One year ago a 100,000-line Java program was large and there were only a handful of them. If a system could handle tens of thousands of lines of code, it was good enough. Today many Java programs exceed 100,000 lines, and soon we should start seeing a few programs reach a million lines of code. Tool builders will therefore be required to scale up their tools to handle such large systems efficiently.

We should also see new Java Virtual Machines capable of running much faster than current ones and significantly closing the gap between native code and interpreted execution. There'll also be native Java compilers that can compile code to execute natively on a platform-by-platform basis. The performance issues will essentially disappear once this happens.

These are very exciting times indeed for the Java community and I look forward to a more mature set of Java developer tools at next JavaOne!

About the Author
Dr. Sriram Sankar holds a bachelor of technology degree in computer science from Indian Institute of Technology as well as MS and Ph.D degrees in computer science from Stanford University. Currently the president and CEO of Metamata, which he founded in 1997, he can be reached for questions or comments at [email protected]

	

Listing 1
 
static XPNServer.XPN ServHelper; 
static Persistence.ObjectServer OSHelper; 

public void setObjectServer() throws org.omg.CORBA.SystemException { 
       Common tc = new Common(); 
       hostName = tc.getServer(); 

       if ((OSHelper == null)) { 
               try { 
        ORB orb = org.omg.CORBA.ORB.init(myAppletInstance, null); 
        IE.Iona.OrbixWeb.Features.Config.setConfigItem 

("IT_BIND_USING_IIOP", "true"); 
       OSHelper = Persistence.ObjectServerHelper.bind(markerServer, 
hostName); 
       ServHelper = XPNServer.XPNHelper.bind(markerServer, hostName); 
       connection = null; //Reset Oracle DB connection 

                       .. etc .. 

      } catch (org.omg.CORBA.SystemException se) { throw se; } 
       } 
} 


 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.