Today's mobile Internet economy has opened the door to a range of new technologies that challenge traditional views of programming.
In particular, new devices are cropping up every day to meet the needs of both business and home users who regularly conduct business via laptops, PDAs and diverse Internet appliances. Visionaries and savvy product developers are also hitting the market constantly with new ways to package functionality into highly focused devices such as Web/cable set-top boxes or even car navigation systems - and telecommunications hubs and switches.
One of the most conspicuous characteristics of these small devices is how restricted in size they are compared to conventional servers and desktop PCs. Yet despite their size limitations, they must still fit a huge amount of functionality into a very small space, running the same types of applications and products that currently run on more conventional platforms. In addition, the applications and embedded databases at the heart of small-space devices need to perform at an acceptable speed. To make matters even more complicated, each device presents its own particular set of requirements in terms of memory constraints, UI capabilities and breadth of functionality. Whatever their purpose, to appeal to consumers and business purchasers in target markets, small devices of this type need to be self-managing and must require little administration.
Java Ideal for Small-Space Applications
The Java programming environment is a natural for applications confined to small devices. For one thing, Java's "write once, run everywhere" capabilities allow programmers to create code that can be used on a range of small-space platforms. Programmers using Java also have the opportunity to leverage classes and the Java factory concept and class loader capabilities to address the specific problems of small environments. Other useful Java features include interface definitions and recursion, which can improve productivity at programming time and enhance the efficiency of resulting applications.
Modular Design Key to Success
The key to creating applications that won't strain the resources of small devices is intelligent software design. The best approach is to start by viewing the software as a modular set of programs rather than a single vast piece of code. Optimal efficiency requires that functionality be segmented according to the manner in which it will be executed at runtime. In this way mechanisms such as classes and Java factories can do their jobs effectively. The Java class loader is a case in point. This feature allows mobile devices to load functionality into memory selectively, pulling only those classes required for specific operations and resulting in effective processing that meets all specific functionality requirements by using onboard memory - "heap space" - efficiently. The effectiveness of the class loader, however, is entirely dependent on whether the base architecture and the Java classes have been designed intelligently. In fact, Java programmers can still choose to write monolithic programs in the "C" tradition.
Java does present a few issues with regard to small-space programming, primarily as a result of the dearth of third-party tools. As with all new platforms, it takes some time for tools such as these to mature. Current issues include how to measure memory usage and how to cope with the high overhead associated with Java objects. Fortunately, a couple of useful workarounds can fill the gap until reliable tools become available.
Leveraging Java Classes
Java classes embody application functionality in logical chunks, potentially in very small sizes. The smaller the discrete piece of functionality, the easier it is to isolate functionality at runtime, thus ensuring efficient memory usage in a small environment. Keep in mind that it's possible to have too much of a good thing: if there are huge numbers of tiny classes, then the resulting overhead could negate the benefits of such granular modularization. Experience helps in balancing the size of classes against the associated overhead.
The use of subclasses also affects the application footprint. From a simplified perspective, if there is a Class A, then a superset called Class B can be defined as including all of the functionality of Class A plus additional functionality required for a particular situation or platform. Class C could be created as a further superset of Class B. In addition, there can be any number of branches within each subclass. By packaging application functionality in this intelligent manner, programmers can go a long way toward maximizing code reuse and eliminating redundancy, creating modularized software that can execute efficiently at runtime.
A Small-Footprint Example
When basic information also needs to be included in index pages, table pages and control pages (all associated with a particular database), what we do at PointBase is to use cache pages (see Figure 1).
The class hierarchy we adopt can be summarized as follows:
- The cache page class is the base class for all pages in a database.
- The data page extends the class cache page to add functionality specific to data pages, such as forward and backward pointers.
- The table row page extends the data page to add functionality specific to pages used to store rows. This includes information such as the number of rows on the page, the location of each field on the page, and the amount and location of free space on the page.
- The row extent page extends the table row page as a special form of the table row page used for overflow information.
- The index page extends the data page. Note that both the index page and table row page extend the data page. These pages have dramatically different uses, but by extending the data page they can share all the code that maintains the forward and backward pointers.
This is a simplified example - our full cache page hierarchy is much more complex than the one described here. Keep in mind that the choice of which classes to extend - and which of those subclasses to further extend - is made based on the way functionality was defined and segmented at the design stage. For maximum efficiency the designer should make sure that the class hierarchy accurately models the problem being solved and should try to reuse as much code as possible.
Class Loader: Traffic Cop
Java class loaders are the answer to "C" programs that need to load the entire executable (i.e., the .exe file) into memory at runtime. The beauty of the Java class loader is that it supplies only the classes required for the functionality requested by the application, and does so automatically. If the software has been designed properly, Java class loaders make it possible for users to run multiple applications and use a broad range of functionality within diverse applications without running into a memory wall, a critical issue for small devices that don't have a lot of room for extra baggage.
Optimizing the Factory Principle
Although the Java class loader helps preserve memory by loading functionality exactly as it's needed - providing great benefits from the point of view of memory usage - this holds true only for memory use during runtime. Persistent storage space isn't preserved. So even if only 100K of a 10MB Java program is loaded into memory, it will still take up 10MB of persistent storage despite the use of the Java class loader principle. The reason for this is that, when creating a JAR file for a set of Java classes, any class referenced by an "import" statement will be included automatically. In our example this means that all 10MB of data will be included in the relevant JAR file and downloaded, even though only 100K of functionality is needed. Fortunately, if the Java program has been written in a modular fashion, this problem can be solved through a technique many Java programmers call factories.
Factories aren't really a Java language feature or construct, but rather are a particular way of using a combination of Java classes to create other Java objects indirectly. To put it another way, a factory is a Java object whose sole purpose is to create other Java objects (it is, literally, a "factory" for producing Java objects). A factory may create and return objects from several classes, but each of those classes must implement the same Java interface.
Using factories with a modular design allows a Java program to be broken up into optional pieces. JAR files can then be built with certain pieces removed, thus producing a smaller JAR file.
Factories and SQL Functionality
Factories can be used to define SQL functionality - such as Data Definition Language (DDL), Data Manipulation Language (DML: Delete, Insert and Update), Query (Select), Security, Business Logic (triggers and routines) and other statements. All of these can be made into factories, and a good reason for using factories for this purpose is that most applications don't require all of this DBMS functionality. Dividing the functionality into factories allows the application to use only what it actually needs.
Factories and SQL Select Statement
As a matter of fact, the factory principle can even be used within individual SQL statements. The SQL Select statement is made up of several objects: parsing, dictionary, optimization, plan generation and execution. With the use of factories, an application needs only the execution factory object at runtime (i.e., the application's JAR file would only use the SQL Select statement's execution factory object).
Factories and SQL Update Statement
Another view of factories emerges from considering what can be done with the SQL Update statement. There's a set of classes that know how to compile and execute the SQL Update statement. If the main UPDATE class is seen in an import statement, then all of the classes used to compile and execute the SQL Update statement will usually be included in the JAR file. However, an UPDATE factory that has the specific job of creating the UPDATE object can be created. The UPDATE factory class can be written to determine if the UPDATE class is available in the JAR file. If it is, the UPDATE factory can return the UPDATE object so that SQL Update statements can be compiled and executed as expected. On the other hand, if the UPDATE class isn't present, the factory can generate an exception or return a special version of the UPDATE object - namely, an object whose only job is to generate an error. Another possibility is that the UPDATE factory could return an UPDATE object that's smaller and has less functionality.
Through this technique JAR files can be configured with less functionality and less Java code. The factories can be used to detect this so no strange errors about a class not being found will be generated.
Big Results in a Small Space
Used in conjunction with the Java class loader, after application functionality has been carefully packaged into appropriate classes, factories extend the benefits of modularity by providing tighter control over which sets of functionality will be included in the JAR file. The result is a highly dynamic system that "breathes" with the end user, remaining slim whenever possible but stretching as needed to accommodate more comprehensive activities.
By organizing functionality in such a versatile and efficient manner, factories also contribute to the development of "ubiquitous" software that can be deployed in various permutations depending on the platforms involved. This means that the same application could be run on everything from a cell phone to an application server, with the full range of functionality available on the server end and appropriately focused pieces of functionality available at the handheld level.
Built-in Java Benefits
for Small Footprint
Java also offers a range of smaller but significant features that contribute to the efficiency of applications embedded in small devices.
One of these features is Java interfaces. These allow common streams of code to treat a variety of classes as identical. In the case of cache pages as described above, an interface definition could permit the cache manager to treat all pages in the same manner, even though other parts of the application might "see" and handle them differently. This keeps the cache manager code simple and consistent for all pages, while also allowing for specialized code in other instances (such as the b-tree index manager).
The Java instance of operator can be used to deviate from the general code to handle specific classes. This would occur when the code handles a set of classes that all support a particular Java interface but there's a need for some class-specific code as well.
Recursion is another helpful Java player; it allows a particular method to call itself over and over to any reasonable number of nestings, eliminating the need for complex loops and special test conditions. This technique is especially useful for data structures and concepts that are naturally hierarchical - for example, the computer file directory. A directory is a list of files. Some of these files are also directories, meaning that directories can contain other directories, which can contain yet other directories. This kind of structure fits very well into a recursive method. A recursive algorithm might do things such as list the fully exploded contents of a directory or search for a specific file in a directory along with all of its subdirectories.
The SQL language has many examples of recursion. For example, a WHERE clause in a SQL Select statement can contain other SQL Select statements, which in turn can contain a WHERE clause, and so on. It's because the SQL language is recursive that we at PointBase use a recursive descent parser to parse SQL statements. The same is also true of SQL expressions.
Measuring Memory Usage and Performance
In my experience there are two areas in which Java presents special challenges for the development of applications with a small footprint: measuring memory usage and performance, and coping with the overhead of Java objects. (At PointBase, luckily, we've come up with a few workarounds that can keep these problems under control until third-party tools become available to cope with them in a more exact way.)
Memory usage is a central issue for developers of small-space applications, since every byte of memory is precious in these environments. The best-case scenario would be one in which programmers could associate memory usage with specific pieces of code. Systems vendors such as Sun Microsystems, Microsoft and IBM help out to some extent by supplying profilers that contain "hooks" into their respective JVMs. Measurements generated by these profilers reflect memory usage fairly accurately for applications as a whole, which is at least a first step toward understanding how memory will be affected by a particular set of programs. However, it's still difficult to pinpoint which specific pieces of code are at fault when memory usage is unacceptably high. Some third-party tools purport to drill down to the code level, but we've found that these tools give conflicting reports, making them appear somewhat unreliable.
What can be done about the memory measurement problem? At PointBase we've become rather creative in approaching it by creating, in some test cases, thousands of objects and then measuring the amount of memory at the macro level to determine the approximate overhead per object. The object creation-time overhead can also be determined in this way.
In the area of performance the goal in the Java environment is to determine how many times each class is called and how long it takes to execute. This allows programmers to identify and improve classes that are performing more slowly. Currently, neither systems vendors nor third-party developers provide reliable tools for accurately measuring performance in this way. We expect the situation to improve over time, but in the meantime fine-tuning based on the findings of OS profilers as described above offers the best workaround.
Solutions for High Object Overhead
Testing has shown that some JVMs associate 40 bytes with every object. Thus, if an object itself requires 8-10 bytes, its overall size would be 48-50 bytes, so memory usage can grow unexpectedly large very quickly. There are situations in which a collection of diverse objects is expected to be relatively permanent (say, in a complex table). In cases like these it's possible to save a substantial amount of memory by either consolidating objects or adopting a C-style approach to coding, such as using arrays. The C-style section of code can be "wrapped" in an object, making it accessible to the rest of the application and therefore maintaining the object-oriented paradigm. In general, relatively permanent objects don't work well for small environments; it's better to use objects and then discard them.
Java is perfectly suited to the demands of the small application environment, providing exceptionally efficient and productive ways to accommodate handheld devices, new Internet appliances or any other function-specific hardware. Modular software design and the intelligent use of Java classes, factories and the Java class loader - with an extra boost from many of Java's basic programming features - help ensure that the runtime footprint remains small without interfering with the functionality and performance sophisticated end users require. Java has taken on the challenges of "ubiquitous" programming and all of the promise of yet-to-be-conceived devices that require pared-down but nevertheless powerful applications and embedded databases. This segment of the industry is still fairly new, but it's growing rapidly; current limitations will no doubt be swept away as more and more tools become available and increasing numbers of people gain appropriate skills and knowledge, and as markets for products expand and multiply.
Bruce Scott is president, CEO and founder of PointBase, a leader in the area of enterprise and embedded database architecture and product development. A cofounder of Oracle in 1972, Bruce cofounded Gupta Technologies in 1984, pioneering the notion of the small-footprint database server for Intel-based platforms.
Jeff Richey, vice president of engineering and cofounder of PointBase, is a recognized leader in database product development. Jeff has over 15 years of database experience, working as a core architect and development manager for IBM/DB2, HP, Oracle and Sybase. He is a patent holder of two key innovations in SQL performance.