There are places in this universe were mortals fear to tread dark
mysterious places replete with shadowy cliffs, hidden treasures, and rumors
of spiritual powers. These places are best left to wizards, hobbits, and
elves and those few who have an unbridled passion for adventure,
challenge, and conquest.
Right now you are either intrigued by where this is going or completely
flabbergasted as to what possible relationship this introduction could have
to Microsoft's .NET. Well, if you observe some developers and how they
approach learning their craft, you would think that making sense of the
internal aspects of this technology is best left to programming gurus and
those few crusaders who dare to tread in the .NET netherworld.
The reality is that understanding the internals of .NET isn't all that
difficult or mysterious, and having good insight into things like garbage
collection, memory management, threading, and Microsoft's Intermediate
Language (IL) will help you become a much better programmer and system
architect.
In this article we are going to delve into the netherworld of .NET
internals by taking on the Microsoft Intermediate Language, officially known
as MSIL, but more commonly known as IL. IL is a language nothing more and
nothing less and it really isn't all that mysterious or complex. So here
goes, let's dive in and start dismantling the beast.
The Purpose of IL
One of the challenges when Microsoft was designing .NET was how to allow
developers to write in any language they choose, yet target a single runtime
environment. The answer to this dilemma? All source code compilers must
translate your program's source code into a single intermediate language.
The purpose of that "intermediary" language is to act as a sort of assembly
language that is independent of any CPU architecture. Once your code is
translated by the source code compiler to IL it is then rather simple to see
how this standard representation of program semantics can be translated into
native code and targeted to a specific CPU. The translation from IL to
native code is the job of the JIT (just-in-time) compiler (see Figure 1).
The Basics
Now that you understand where IL fits into the overall .NET world, you
next need to understand some of the basic language concepts. IL is a fully
functioning language; it has data types, flow control, object model
instructions, and everything else needed to make a program. In order to
help you get a good handle on IL, we are going to start by writing,
compiling, and running a small program, and then we will dissect it to give
you a better understanding of how it operates.
Being a traditional sort of guy, I'll start with that favorite and
time-honored workhorse of programmers everywhere "Hello World." Try
entering the code from Listing 1 into any text editor. Visual Notepad works rather nicely. (Visual Notepad is an inside joke from my days at Microsoft.
It always seemed easier to write code samples in Notepad when doing
presentations.)
Once you have entered the code, save your file as HelloWorld.il and open
a command window. The first step is to take the IL and turn it into an
executable, more correctly known as a PE file. To do this, make sure you
have a path set to the IL Assembler, where your copy of ILASM.EXE resides,
then type in "ILASM HelloWorld.IL" and hit Enter. The compiler should
generate the results shown in Figure 2.
This is good. The IL Assembler completed without errors and you should
now have a spanking new HelloWorld.exe file in your directory. Execute the
file and you should see those timeless words "Hello World" sparkling across
your screen. So there you have it, more than likely your first program
written in 100% pure IL. Not all that mysterious, right?
Breaking It Down
So now let's start dissecting what we did line by line and see if we can
make some sense of it all. The first line of the program points to the
assembly we would like to use. Note that the word "assembly" is preceded by
a "dot." In IL, a dot precedes what is known as a "directive." Basically, a
directive is an instruction to the assembler to carry out some unit of work.
In this case the assembler will create an assembly manifest called "Hello".
This is a required line, and without it the program will compile but will
generate an I/O exception when you try to run it.
The next line, ".method static void HelloWorld()", is a directive for
the assembler to create a method that is named HelloWorld. Since we declared
it as "void", nothing will be returned. The beginning and endpoints of the
method are bracketed by curly brackets, the same as would be used in a C#
program. So you must be wondering why we included a "static" declarative in
the method directive for HelloWorld(). If you look at the next line in the
program, you will see that it has an ".entrypoint" directive. This tells the
assembler that this is the program's initial point of entry, much like the
"Main" method in other languages. IL programs must have only one entry
point, and the entry point method must be declared as static.
For now let's skip over the line that contains the "ldstr" instruction
and concentrate on the line that makes a call to the .NET Framework. You might notice that this line is not preceded with a dot. In IL, lines that are not preceded by a dot are considered instructions. Instructions are basically the guts of your program, in which directives are the infrastructure and piping. The "call" instruction allows us to make calls into other assemblies. In this case we make the call to the appropriate library, allowing us to write to the console.
What about that line with the "ldstr" instruction? First, let's go back
and reiterate one of the concepts from beginning of this article: IL is an
intermediary between your source code and the JIT; it must not make any
assumptions about the underlying operating environment. For instance, it
cannot make assumptions about the CPU architecture or instruction set
employed by the servers where your code executes. In order to meet this
requirement, Microsoft elected to use a simple, yet elegant, approach they
made IL stack-based. The process of copying something from memory to the
stack is called loading, and the process of writing to a variable on the
stack is known as storing.
The basic IL instruction set is divided between instructions that load
and those that store. Instructions that load something to the stack are initialized with the letters "ld" and those that store are initialized with the letters "st". So "ldstr" is an instruction to load a literal string onto the stack, in our case the
string "hello world".
A Little Magic
In the next example we are going to learn how to interact with the stack
and with values returned by other programs. The goal is to take the strings
"hello" and "world" and concatenate them using native IL. First we load the
stack with our string values; we accomplish this using the "stloc"
instruction. Since it is preceded with a "st", we know that it is going to
write to a variable on the stack. The code block for carrying out this
little feat of magic is rather straightforward:
ldstr "hello "
stloc.0
ldstr "world"
stloc.1
All this code really says is to load a string literal onto the stack and
then write it to the appropriate location. In our case we are storing the
string "hello " in slot one and the string "world" in slot two. Once we have
things stored in their locations, we can turn our attention to concatenating
the string in order to create our trusty "hello world" salutation. To
accomplish this we need to load things onto the stack from memory, and this
is done with the "ldloc" instruction, which also takes a slot parameter.
The code to load these two strings is again fairly straightforward:
ldloc.1
ldloc.2
Now that we have our strings we make a call out to the framework and
concatenate the two strings:
call string [mscorlib]System.String::Concat
(string,string)
Notice that the Concat() function returns a string, so we need to store
that result, load it, and then write it out to the console. You can see a
pattern emerging: we basically store, load, and manipulate through the use
of IL instructions. Listing 2 shows the full program that takes the two
strings and concatenates them.
We have been concentrating on the use of strings, and our program has
been rather "top-down" in nature, without any logic. As a sort of
graduation exercise we are going to work through a program that introduces
some math and branching logic. The program will basically take two inputs,
"Total Sales for Today" and "Total Returns for Today". We will then
subtract the returns from the sales to produce our net sales. Depending on
whether or not net sales is greater then 10, we will display the appropriate
message on the screen.
This program uses more then four variables, so we are going to need to
initialize some variables. The directive to initialize local variables
basically takes the ".locals" command, followed by a function to initialize
our variables.
.locals init ([0] int32 iSales, [1] int32 iReturns,
[2] int32 iNet, [3] string a, [4] string b)
The reason we cannot use the previous syntax, stloc.[slot number], is
that "stloc" can address only the first four slots using dot notation. So
the instruction "stloc.3" is valid, but "stloc.4" is not. Once we go above
four variables, we need to use labels and apply the following syntax:
stloc.s "variable label". If we had initialized a variable labeled
"greeting" we could address it as "stloc.s greeting". This also holds true
for the "ldloc" instruction, whereby we would load the "greeting" variable
onto the stack using the "ldloc.s greeting" syntax.
The Final Exam
Now let's look at the complete program. I think you will find it easy
to follow and will probably surprise yourself with how well you are starting
to understand IL. Listing 3 is your final exam.
How did you do? The programming concepts should be pretty
straightforward by now. We load some strings to facilitate the calls to our
WriteLine() methods; we read in and store the input from the ReadLine()
method calls; and then we use the .NET Framework to parse the strings into
integers and store the results of the conversion in slots 0 and 1,
respectively. We then copy the memory values for slots 0 and 1, subtract
them using the "sub" instruction, and store the result of the calculation in
slot 2. Then we load the result back to the stack using ldloc.2, and we
come across something new, the instruction "ldc.i4.s 10". This is just an
instruction to load a constant of type integer with a value of 10 onto the
stack. The "ldc" instruction can support a 4-byte integer (i4), an 8-byte
integer (i8), a 4-byte float (r4) or an 8-byte float (r8). Aside from "ldc"
and "ldstr", IL supports the loading of arguments, local variables, fields,
and elements.
Once we have the result of our calculation and the value we want to
compare against in this case 10 we can use the IL branching logic
instructions to decide whether we had a good day or not. The "ble"
instruction "branch less then or equal to" looks at the two values and,
based on the result of the comparison, either executes the next line of code
or jumps to an IL line number, much like a "goto" statement. Branching
instructions are always preceded by a "b" and are complemented by calling
instructions.
Between the branching and calling instruction sets, IL manages the flow
of a program's execution. IL also supports the ability to manage exceptions
and has instructions specific to the .NET object model. If you worked
through the examples in this article, you are now familiar with the IL
Assembler. Another program that you should spend some time with is the IL
Disassembler, which takes a .NET executable and generates IL for you to
inspect. The Disassembler is in the .NET Framework SDK/Bin folder and is
named "ILDASM.exe". The tool is rather intuitive and should take you little
time to understand after having read this article.
Conclusion
What do you think? Was it all that mysterious or scary? The reality is
that getting to know IL, as well as other areas of .NET internals, will help
you write better code, develop more robust and efficient architectures, and,
most important, give you the skills and knowledge that can make all the
difference in the world when things go bump in the night.
About The Author
John Gomez is CEO and chief scientist at Group Espada, a company specializing in advanced .NET development, training, cyber-security, and counter-hacking. When John isn't
running Group Espada, he enjoys spending time with the hobbits and elves of
the .NET netherworld.
jgomez@groupespada.com
Listing 1
.assembly Hello {}
.method static void HelloWorld()
{
.entrypoint
ldstr "Hello World."
call void [mscorlib]System.Console::WriteLine(class System.String)
ret
}
Listing 2
.assembly Hello{}
.method static void HelloWorld()
{
.entrypoint
ldstr "hello "
stloc.0
ldstr "world "
stloc.1
ldloc.0
ldloc.1
call string [mscorlib]System.String::Concat(string,string)
stloc.2
ldloc.2
call void [mscorlib]System.Console::WriteLine(class System.String)
ret
}
Listing 3
.assembly Sales{}
.method static void Sales()
{
.entrypoint
.locals init ([0] int32 iSales,
[1] int32 iReturns,
[2] int32 iNet,
[3] string a,
[4] string b)
ldstr "Total Sales For Today\?"
call void [mscorlib]System.Console::WriteLine(string)
call string [mscorlib]System.Console::ReadLine()
stloc.3
ldstr "\nTotal Returns For Today\?"
call void [mscorlib]System.Console::WriteLine(string)
call string [mscorlib]System.Console::ReadLine()
stloc.s b
ldloc.3
call int32 [mscorlib]System.Int32::Parse(string)
stloc.0
ldloc.s b
call int32 [mscorlib]System.Int32::Parse(string)
stloc.1
ldloc.0
ldloc.1
sub
stloc.2
ldloc.2
ldc.i4.s 10
ble.s IL_0045
IL_0039: ldstr "You had a great day!"
IL_003e: call void [mscorlib]System.Console::WriteLine(string)
IL_0043: br.s IL_004f
IL_0045: ldstr "Better luck tomorrow."
IL_004a: call void [mscorlib]System.Console::WriteLine(string)
IL_004f: ret
}
All Rights Reserved
Copyright © 2004 SYS-CON Media, Inc.
E-mail:
info@sys-con.com