When I was in junior high school music was one of the most important
factors in life. Few things were more important than being up-to-the-minute
on which bands were "cool" and which were to be eschewed. Regardless of what
genre you liked, yesterday's bands sucked and today's bands ruled. So it is
with the software industry. At any given moment there are the "cool"
technologies (as I write this the prevailing attitude is that Web services
rule) and the has-beens (COM was cool but now seems positively dowdy). I don't think this is necessarily unhealthy experimentation is the only way
we're ever going to figure out how to write software without inhuman amounts
of effort.
One idea has managed to remain cool for the last decade
object-oriented development. OO has been around long enough that it's
starting to look, well, mature, and it's hard to be both mature and cool. At
least two popular new technologies have largely abandoned objects: XML/XSLT
and Web services.
What this means to me is that the Next Big Thing will probably not be
objects. This is interesting because most of the languages used on the .NET
platform today are object-oriented in nature. How will the CLR cope if next
year's favorite development paradigm turns out to be "Bilinear
Aspect-Oriented Development"? Can the CLR accommodate nonobject languages?
The answer depends on the relative agnosticism of the .NET runtime. If
the platform is well factored, then the generic services built into it (like
memory and thread management) should make it easy to accommodate next year's
fashionable ideas in a way that solidly integrates with today's has-beens.
Since it's very hard to know what the next "cool" model will be, it is
important to try to keep the runtime as free of bias as possible without
trying to be all things to all people. This is the tightrope the CLR will
have to walk as it evolves over the next few years. If it manages to
succeed, then the managed platform might still be around even after
object-oriented programming goes into the hopper with all of our old REO
Speedwagon records.
One Runtime to Bind Them All
Computer languages come in an infinite variety of flavors, but
underneath their syntactic differences they work in more or less the same
way. All languages give their users the ability to create some kind of
programmatic variables (whether C++ objects or LISP cons nodes) and a set
of operations for manipulating them. Most languages additionally define a
memory model and a set of libraries useful for common programming tasks.
The defining vision of the .NET platform is to distill these four
factors into a single consolidated runtime so that compilers can focus on
language issues instead of platform details. This vision is outlined in ISO
23271, the .NET Common Language Infrastructure (CLI) specification. This
specification defines a Common Type System (CTS) so that compilers don't
have to synthesize entire type systems from scratch, a Virtual Execution
System (VES) to insulate compilers from the vagaries of CPU register
allocation, and a garbage collector to provide automated memory management.
Compilers don't have to emit API calls to take advantage of runtime
services; they emit code in the .NET Intermediate Language (IL) and the
runtime supplements execution with services as appropriate. Finally, the
Base Class Library (BCL) provides a set of around 600 classes for doing
common things like File I/O.
Porting a new language or framework to .NET will require interaction
with each of these four subsystems. The two with the most immediate impact
on a language's performance characteristics are the Common Type System and
the garbage collector.
The Common Type System
The Common Type System (CTS) is arguably the most critical part of the
CLI specification because it is the largest determinant of how well a given
language will work with the underlying Execution Engine. The mapping a
compiler makes between its programmatic types and the CTS has an enormous
effect on execution speed as well as interaction with other code.
The Common Type System is object-oriented in nature and utilizes
single-implementation inheritance. Not coincidentally, this is exactly the
model used by both C# and VB, which are designed directly around the CTS.
These compilers map syntactic constructs like classes onto the CTS in an
almost one-to-one manner. If you compile a C# program and open the resulting
assembly with ILDASM, you'll find that its contents look practically
identical to the ones in the source code; not surprising, given its
unusually close kinship with the .NET platform. Other single-inheritance
languages like Java are similarly easy to map onto the CTS.
What about multiple inheritance? Eiffel, is an object-oriented
language that supports multiple inheritance and templates, neither of which
are natively supported by the CTS. Eiffel.NET still provides the full
functionality of the Eiffel language by mapping each Eiffel type onto
multiple CTS types (several CTS classes are used in concert to implement a
single Eiffel). This mapping is an uninteresting compiler detail to a pure
Eiffel programmer, but comes into focus as soon as the code is distributed
to developers used to C# or VB.NET. A simple Eiffel class is shown in
Listing 1. A VB programmer who fires up ILDASM and points it
at an Eiffel assembly sees something like Figure 1 a thicket of CTS
classes whose proper use isn't immediately obvious.
This doesn't mean that Eiffel is poor or that it's poorly suited to the
CLI. The more tangled the mapping between a language and its representation
in the CTS, the harder it's going to be to consume it from other languages.
Other languages (notably managed C++) generate constructs that are even more
difficult, if not impossible, for others to use.
Despite its object-oriented nature, the CTS isn't particularly biased in
favor of OO languages; languages like COBOL and FORTRAN still use ints and
floats after all. These compilers happily base their type system on the CTS
and just ignore its more object-like constructs. The real bias of the Common
Type System is in the fact that it is statically typed every object and
object reference is bound to a type at creation and that binding can never
change. This has implications for all languages, whether they are OO or not.
The CTS isn't statically typed just because the implementers happened to be
in a statically typed mood on the day they designed the type system. Rather,
static typing is the linchpin of code verification.
Verification
One of the most important developments of the 1990s was the development
of semantic type checking if enough information is known about how code
manipulates variables, then it is possible to make strong statements about
whether or not that code is safe to execute. This process is familiar to
both Java and C# programmers as verification the runtime inspects IL prior
to execution, and then may decide not to execute it if it is unable to prove
that the code can't do any harm. Code that can't be proven to be safe is
called "unverifiable code" and will not be executed unless it has been
granted trust.
Semantic type checking is a relatively new field and verifiers today can
only prove safety for a limited set of operations not all safe code is
verifiable. The .NET verifier can only verify code where both the code (IL)
and the object instances act not only statically typed but in a
strongly-typed manner where the type of object instances is never vague. As
a result, some IL instructions are not verifiable (like cpblk), and native
code is never verifiable.
Dynamic Typing
Verification's insistence on strong typing is a major impediment for
dynamically typed languages where the type of a variable can change with
each assignment. In JScript, for example, the following code sequence is
perfectly legal:
var a = 7; //a is holding an Int32 value
var b = 8;
a = (a+b).ToString(); //a is holding a string value.
Building a dynamic type system on top of a static one requires heavy
runtime overhead, with lots of object conversions. The results are rarely
pretty. Compiling the JScript above into IL produces the code in Listing 2.
Variables a and b are System.Object references that in this case are
created via the IL box instruction. Box isn't particularly cheap and tends
to create garbage every time it is used. The JScript compiler also can't use
native IL instructions like add because "+" in JScript might mean something
different every time it appears. Compiled JScript code instead relies on a
715K runtime support library for supplementary functions like EvaluatePlus.
All of this overhead means that dynamically typed languages are
second-class citizens on the .NET platform today. The only viable use for
dynamic typing is for scripting languages where performance isn't an issue.
The hostility of verification toward dynamic typing today prevents the easy
porting of a large class of languages.
Luckily, compilers have another option to trade verifiability for performance.
Mixed-Mode Compilation
The execution model I've outlined so far isn't particularly unique to
.NET. The Java platform in particular provides a very similar model,
including a statically typed object-oriented type system, an intermediate
code representation (bytecode), garbage collection, and a standard class
library. The biggest difference between these two platforms is their stance
on code verifiability. The Java platform's approach to verification is like
that of a strict elementary school teacher all code must pass the verifier
to run no ifs, ands, or buts. JNI allows Java to call unmanaged methods,
but it doesn't allow unmanaged types or code to mingle with their managed
brethren.
The .NET platform's approach to verification is more like a hippie
college professor the user gets to decide which programs are required to
pass verification and which are not. Compilers make a choice to either
accept the performance hit of verifiable code in exchange for enhanced
deployment flexibility (as exemplified by JScript and Eiffel) or sacrifice
deployment flexibility in favor of performance by mixing unverifiable or
unmanaged constructs into the code.
Compilers that utilize native code or unmanaged types are called
mixed-mode compilers. Mixed-mode compilation enables languages like ANSI C
and C++ to run on the .NET platform with reasonable performance. Mixed-mode
compilers can pick and choose which services they want to relegate to the
runtime and which they would rather implement themselves. This flexibility
makes a lot of things possible on .NET that would be awkward or impossible
on the JVM. A programmer might want to represent a custom hardware device as
a managed class but finds that communicating with the device requires a few
lines of platform-specific assembly language. A mixed-mode compiler could
implement a method in native machine code without forcing the programmer to
go through an interop layer like P/Invoke or JNI. The compiler emits
metadata for the method, declaring it as native, and provides enough
information to allow the runtime to invoke the method. This lets the
runtime call the method directly, rather than going through an interop layer
like P/Invoke.
Languages that don't fit neatly into the CTS's rigid type model may
prefer to use a type system of their own devising but still use IL to
manipulate instances. Listing 2 shows a short managed C++ program that
declares managed and unmanaged versions of a class; Listing 3 shows the
resulting IL (slightly abridged). Notice that the managed version of the
structure contains descriptions of the fields it contains. The unmanaged
structure simply contains an attribute telling the runtime how big instances
of this type are, but it's enough information to allow instances to be held
and manipulated in the runtime.
Verification is not the only thing mixed-mode compilers sacrifice.
Mixed-mode assemblies are likely to be tied to a particular CPU and
operating system if they contain CPU-specific native code (The Mono
environment today doesn't support mixed-mode assemblies compiled for Windows
and probably never will). Mixed-mode assemblies that use unmanaged
constructs are also harder to integrate with modules written in other
languages (it's not called the Common Type System for nothing).
Memory Management
The performance of a language on any given CLI platform is going to be
influenced by how well it interacts with the garbage collector chosen by the
platform implementers. This is a tricky area, because garbage collectors can
be optimized in several ways:
To use a minimum (or fixed) amount of memory
To use the least possible CPU (by running less often)
To make GC collection intervals as short as possible (at the expense of
more total CPU)
Different types of languages place different loads on the garbage
collector. Procedural and OO languages tend to create relatively few,
largish objects that stick around for a fairly long time. Functional
languages tend to create zillions of teeny objects (e.g., LISP cons nodes)
that almost all become garbage right away. The performance of language x on
managed platform y is going to depend on how garbage collector y handles the
GC demands placed by x. The garbage collector currently used by the CLR is
optimized to keep GC collection intervals short; since it expects most code
to be OO/procedural in nature, it's safe to say that it's optimized for that
case. The performance of other languages may vary depending on how this
pattern suits them.
Conclusion
Regardless of what the Next Big Idea looks like, I think there's little
question that a decently performing implementation can be built. Mixed-mode
compilation provides a way for even the strangest of languages to run
reasonably well while using the services of the runtime. What is harder to
answer is (a) if it will be able to run efficiently and pass the verifier or
(b) how easy it will be to use from other languages. The CLI platform
continues to evolve and over time will probably introduce constructs for
dynamically typed language. As the field of semantic type verification
advances verifiers should become more powerful as well. With any luck the
CLI will still be around not only when OO goes out of style, but also when
it comes back as a "retro-cool" idea.
Further Reading
Gough, K.J. "Stacking them up: A Comparison of Virtual Machines."
http://sky.fit.qut.edu.au/~gough/VirtualMachines.ps
Meijer, E. "Scripting .NET Using Mondrian".
http://research.microsoft.com/~emeijer/Papers/ECOOP.pdf
Meijer, E., and Gough, J. "A Technical Overview of the Common Language
Infrastructure." http://research.microsoft.com/~emeijer/Papers/CLR.pdf
Gilmore, S. "Resource-Bounded Functional Programming on the JVM and
.NET."
www.dcs.ed.ac.uk/home/stg/MRG/comparison/slides.pdf
Hanson, D.; lcc.net: "Targeting the .NET Common Intermediate Language
from Standard C."
http://research.microsoft.com/~drh/pubs/msr-tr-2002-112.pdf
Arnout, K. "Eiffel for .NET: An Introduction."
www.devx.com/codemag/Article/8500/1954
Author Bio
Jason Whittington is a consultant and researcher with an irrational
fascination with virtual execution environments. When he's not researching or
consulting he can often be found delivering courses for DevelopMentor. His
Web site can be found at http://staff.develop.com/jasonw.
jasonw@develop.com
Listing 1
indexing
description: "Project root class"
class
HELLO
create
make
feature {NONE} -- Initialization
make is
indexing
description: "Entry point"
do
-- Write your code here.
end
end -- class HELLO
Listing 2
using <mscorlib.dll>
class upoint { public: int x; int y; }; //unmanaged type
__gc class mpoint{ public: int x; int y; }; //managed type
main()
{
mpoint * mp = new mpoint();
mp->x = 1;
mp->y = 2;
upoint * up = new upoint();
up->x = 1;
up->y = 2;
}
Listing 3
.class private auto ansi mpoint
extends [mscorlib]System.Object
{
.field public int32 x
.field public int32 y
.method public specialname rtspecialname
instance void .ctor() cil managed {..}
}
.class private sequential ansi sealed upoint
extends [mscorlib]System.ValueType
{
.pack 1
.size 8
}
.method public static int32 main() cil managed
{
.vtentry 1 : 1
// Code size 39 (0x27)
.maxstack 2
.locals (class mpoint V_0,
valuetype upoint* V_1)
ldnull
stloc.0
newobj instance void mpoint::.ctor()
stloc.0
ldloc.0
ldc.i4.1
stfld int32 mpoint::x
ldloc.0
ldc.i4.2
stfld int32 mpoint::y
ldc.i4.8
call void* new(unsigned int32)
stloc.1
ldloc.1
ldc.i4.1
stind.i4
ldloc.1
ldc.i4.4
add
ldc.i4.2
stind.i4
ldc.i4.0
ret
}
All Rights Reserved
Copyright © 2004 SYS-CON Media, Inc.
E-mail:
info@sys-con.com