HomeDigital EditionSearch Dotnet Cd
ASP.NET C# Certification Exams The CLI Data Access Editorials Extending .NET Fundamentals Interoperability Interviews Migrate Mobile .NET Mono .NET Interface Object-Oriented Programming Open Source Optimization Product/Book Reviews Security Source Code UML Visual Studio .NET

Eighteen months ago Microsoft upset the Win32 apple cart by announcing a new execution technology and promptly labeling it with the ActiveX-esque name of "Microsoft .NET." I think it's safe to say that life hasn't been the same since for anyone who works with Windows. .NET makes it seductively easy to write code that just seems to work. No more puzzling through WNDENUMPROC or IUnknown** or malloc/free ­ all that grungy stuff is handled for you.

I've observed that when confronted with this prospect developers tend to react in one of two ways. Some developers shrug and happily accept the new convenience. They go on and write code and have happy, meaningful lives. Other developers react with suspicion at the prospect of having another layer of software between them and the silicon. While the first group happily writes production code and goes home, these developers poke at the new runtime with a pointy stick, trying to get an idea of what's going on under there.

I belong to the second group. And I intend Managed Space to be a support group of sorts for us pointy-stick types, where we can explore the workings of managed code everywhere. Compilation. Garbage collection. Threading. Remoting. How are these features implemented? What design trade-offs did the designers of the platform make and why? This kind of knowledge can ultimately be used to write better software, but that's not always the point, is it?

Is Managed Code Portable?
Managed code is not about Windows. It's not even supposed to be about Microsoft. The final authority on managed code is supposed to be the set of specifications collectively known as the Common Language Infrastructure, or CLI. The CLI is a carefully thought-out specification that was built using feedback from numerous compiler authors and researchers. The final specification was then turned over to ECMA, where it currently resides as specification ECMA-335.

As soon as the CLI specs were released, various communities sprang up around them and began building alternate implementations. Today you can see the results of these efforts in projects like Mono, dotGNU, and Intel's OCL. Microsoft itself has released the Compact Framework for PocketPC devices, and has released source code to an implementation they call the SSCLI (but that everyone else calls "Rotor"). These implementations run on a variety of hardware and OS environments, and today there are scads of researchers and hobbyists tinkering with them.

The natural question is: How portable will the code be? I want to be able to compile assemblies in one place (like the desktop CLR) and run that code unmodified in other places (like Mono/Linux or the Compact Framework on PocketPC). At the same time, implementation developers want maximum flexibility for how they design their runtimes. The CLI specifications walk this tightrope to balance these conflicting demands by delineating which constructs are fair game for tinkering (e.g., loaders, JIT compilers, garbage collectors) and which constructs are sacrosanct (IL and metadata formats and semantics, some library code).

Loading Code
Job number one for any managed code implementation is loading code from assemblies into memory. The CLI describes in great detail exactly how a managed code module must be laid out ­ there is very little wiggle room. The actual file format used is based on the Portable Executable (PE) file format used by NT. Even so, the contents of a managed EXE are drastically different from those of a typical Windows EXE, and the format is indeed reasonably platform-neutral. PE files are little-endian by nature, so arguably they are a hassle to deal with on big-endian platforms like the PowerPC. It's not impossible though ­ both Mono and Rotor are currently running on PowerPCs.

The PE file format does make one concession to Windows that doesn't make a lot of sense for other platforms. It requires the "entrypoint" RVA field in the PE header to hold a peculiar byte sequence: 0xFF, 0x25, followed by an adjusted file offset. This is a subtle trick ­ when viewed as x86 opcodes on Windows the sequence evaluates to "jmp _CorExeMain (in mscoree.dll)". This trick allows traditional Windows to run managed executables directly, but this byte sequence has no meaning anywhere else and is simply ignored by most implementations. For instance, Rotor and Mono both use separate hosting programs ("clix.exe" and "mono" respectively) to load up the runtime and begin executing code; they never look at this field. The Windows XP release itself doesn't even look at this field anymore ­ the operating system recognizes managed code and loads assemblies natively. Over time it will become a relic, just like the "This program cannot be run in DOS mode" header in Windows programs today.

The tight guidelines for .NET assemblies don't mean that every environment will be able to load every assembly. There are provisions in the CLI that allow compilers to emit processor-specific native code directly into modules. (Microsoft's C++ compiler will do this if you try to compile some constructs while using the /CLR switch). Assemblies generated this way are going to be useless on any platform other than the one they target.

Versioning
One of the touted features of Microsoft .NET is its elaborate version management. Config files, the GAC, and publisher policy assemblies allow incredible flexibility in version matching and assembly loading, even automatic assembly downloads from Internet URLs. While these features are an important part of the Microsoft .NET platform, they are not really central to the managed-code platform. If you browse the CLI you won't find any mandated versioning model ­ each implementation gets to do things however it likes. Rotor inherited its loader from the CLR and thus works the same way, but other platforms are free to implement their own loading and version policy schemes, which means complex applications will probably take some tweaking when moved to other environments. Mono, for example, sidesteps the whole problem by simply requiring that shared assemblies be stored in a special directory.

Execution
Once code is loaded, it has to be executed. One of the bedrock assumptions of managed code is that JIT compilation and execution is vastly preferable to interpretation. The IL instruction set is designed to make life as easy as possible for JIT compilers at the expense of interpreters. Interpreting IL is certainly possible ­ the "mint" tool that ships with Mono does exactly that ­ but the vast majority of implementations do in fact JIT compile IL into platform-specific code. The interpretation versus JIT compilation decision ultimately shouldn't affect how portable your code is, just how fast it will run.

JIT compilation is still a relatively young field, so different platforms work differently. Broadly speaking, you can lump JIT compilers into "fast JIT" and "optimizing JIT" categories. Fast JIT compilers concentrate on blasting out x86 code as fast as possible without doing any deep analysis. Both Rotor and Mono feature fast JIT compilers. Optimizing JIT compilers, on the other hand, spend time analyzing incoming IL to generate better-performing code. For example, the JIT compiler in the CLR analyzes the base types of classes it's compiling. For most classes the compiler can perform method inlining for some methods, which produces noticeable speed benefits. For classes derived from System.MarshalByRefObject the JIT compiler does not perform this inlining. Since MarshalByRefObject-derived classes might be called remotely, they need to have complete implementations available. Differences in JIT compilation models shouldn't affect you as a developer in theory, but in practice I think it's safe to predict that each of the JIT compilers will have bugs, and these bugs will be somewhat elusive.

Once a program is running it will want to use memory. The CLI states that implementations need to have a garbage collector but says very little about how that collector should behave. Runtime vendors are actively experimenting with GC, so it's a bad idea to write any code that depends on GC happening in a particular fashion ­ move that code to another platform and it will most likely break.

One of the most interesting distinctions between different runtimes is how their JIT compilers and garbage collectors interact. On the surface these two jobs seem radically different ­ the JIT compiler generates code and the garbage collector cleans memory. In practice, a lot can be gained by integrating these services. JIT compilers sit in a prime location to track where code is creating and using objects ­ valuable information to have around at GC time. Mono keeps the two basically compartmentalized as much as possible, which makes them easier to understand and experiment with. In the desktop CLR and Rotor the JIT compiler and the garbage collector share some data structures and exchange information, which probably makes them faster and certainly makes them harder to understand.

Libraries
CLI implementations come in all shapes and sizes. Some (like the CLR and Mono) target desktop or server environments with plenty of processor power and memory, while others (like the Compact Framework) target small devices with very limited resources. Recognizing this, the CLI specifications give implementers a few choices regarding which libraries they want to support.

Partition IV ("Libraries") of the CLI outlines two basic models or profiles that platforms can support. The smallest practical subset of the CLI is called the Kernel Profile, which includes basic class library support. I don't know of any implementations that implement only Kernel functionality ­ every implementation I know of implements the larger set of functionality called the Compact Profile. The Compact Profile includes a lot of classes from the System.Net, System.Reflection, and System.XML namespaces. The Rotor implementation is the closest implementation today to a pure Compact Profile implementation.

The Compact Profile outlines a lot of classes, but the functionality present would still seem pretty sparse to commercial developers today. Currently there are no standards for more advanced functionality like user interfaces, Web sites, or database access, so support will vary by platform. Microsoft, of course, has defined a tremendous amount of functionality in libraries like WinForms, ASP.NET, and ADO.NET. It's not really required for other implementations to host identical functionality, but it is convenient. The Mono implementers are busy trying to replicate practically everything present on Windows, even including ADO.NET and WinForms libraries.

High-level libraries can be written in managed languages and often can be written in 100% native code. In contrast, libraries like mscorlib really can't be written this way ­ low-level services like File I/O need to cooperate with the underlying operating system to get their work done. As a result, these assemblies are usually developed in tandem with a particular implementation and expect to find their particular platform for execution. Microsoft's mscorlib, for example, would be bewildered to find itself being used in a Linux environment (where the heck is the CreateFile() API?) so you couldn't just dump it onto a Mono distribution and expect it to work. Every runtime will ship its own version of the core classes. Microsoft's mscorlib uses P/Invoke to talk directly to Win32; Mono uses internalcall routines to talk directly to Linux.

A subtle problem arises when a program needs to use these classes. Anytime you compile a program that uses another managed library, the compiler records the public key token of the library as part of the program's metadata. When a user executes the managed program, the underlying runtime can use the recorded public key token to verify the integrity of any libraries it tries to call and ensure that the contents of the library are indeed what the calling program expects. This scheme works well for most classes, but what about core assemblies like mscorlib? If you try to run a program on a platform other than the one you used for development, the system libraries you compiled against won't exist. Even if the libraries have the same name (e.g., "mscorlib") they can't all be signed with Microsoft's public/private key pair for obvious reasons (I think it's safe to say that Microsoft is unlikely to release their private key for general consumption).

This presents a serious barrier to code portability, and if it weren't addressed you would need to recompile for every platform to which you want to deploy. Luckily this issue is addressed for the core libraries in a pretty elegant fashion. If you look at the central library assemblies like mscorlib shipped with .NET, you'll see that they are signed with a peculiar public key:

.publickey = (00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 )

These assemblies are easily recognized in the GAC viewer as having a public key token of b77a5c561934e089. Notice that this key is only 16 bytes long and looks nothing like the typical 1024-bit keys typically used to sign libraries. This key is a special key, called the "ECMA public key", and its presence denotes special semantics. In effect this key tells the runtime "I am an assembly that contains ECMA CLI types." Every platform should advertise this public key in its version of libraries like mscorlib. Even though the libraries advertise this key, it's not the key used to cryptographically sign the assembly. When a CLI implementation loads an assembly with the ECMA key specified as its public key it assumes that the assembly was actually signed with the platform's own public/private key pair. This way every platform can maintain control over its versions of common libraries without having to write all kinds of weird special-case code ­ programs written to link against one platform's mscorlib should run unmodified on other platforms.

This only works for libraries that Microsoft signs with the ECMA public key. Version 1.0 of the CLR shipped with six assemblies signed with this key:

  • mscorlib
  • System
  • System.Data
  • System.Runtime.Remoting
  • System.Windows.Forms
  • System.Xml
Programs that want to target the Compact Framework may want to depend on some other libraries as well that are not signed with this key (like System.Web.Services). Initially, the Compact Framework team was going to require recompilation but changed their minds in the Everett release. With .NET 1.1 Microsoft addressed this by adding a new flag to assembly references in metadata called "redistributable". If you use ILDASM to look at a program compiled against mscorlib you'll see:

.assembly extern redistributable mscorlib ...

The intention of this attribute is to tell whatever runtime is running this program to feel free to substitute any assembly it likes in place of the one specified. Currently only the Compact Framework and version 1.1 of the CLR make use of this flag, but I expect to see it used more as people try to share applications across different CLI implementations.

Conclusion
The fact that there are so many managed implementations in the works today is testament to the power and scope of the CLI standard. Code should be reasonably portable from runtime to runtime as a result of its efforts, but libraries and environmental differences will probably keep most applications from being able to simply redeploy without touching the code. In future editions of this column I'll dig into the details of various platforms, so start sharpening those sticks.

Resources

  • ECMA-335 (CLI): www.ecma.ch/publications/standards/ecma-335.htm"
  • SSCLI (Rotor): http://msdn.microsoft.com/net/sscli
  • Mono: http://go-mono.com

    Author Bio
    Jason Whittington is a consultant and researcher with an irrational fascination with virtual execution environments. When he's not researching or consulting he can often be found delivering courses for DevelopMentor. His Web site can be found at http://staff.develop.com/jasonw. jasonw@develop.com

    All Rights Reserved
    Copyright ©  2004 SYS-CON Media, Inc.

      E-mail: info@sys-con.com