Ah, the need for speed. It drives many of us insane as we spend late
nights in a dark and dingy development shop tricking out our code to gain
just another microsecond of performance. Programming legends live and die by
the performance of their code. Many of my brethren have lost the pink slip
to their laptop on a Friday night in a parking garage on Microsoft's Redmond
campus. Regardless of the efforts of the elite Microsoft security forces,
the code battles will continue, and my hope is that this article will leave
you standing among the performance masters and not one of the many left
wondering what went wrong. Shall we get started? Ready to take it to the
edge and push .NET for all its got? Then let's go.
Optimizing .NET application code for performance is, as with many
things, a mix of art and science. The art of performance tuning is largely
focused on balancing the need for speed with other factors, such as future
flexibility, time-to-market, security, and user requirements. The scientific
aspect involves gaining a strong understanding of how .NET works under the
covers. In this article I'll discuss some of the internal aspects of .NET
that affect performance, but I will also consider the artistic side of
performance tuning. My hope is that you will gain a new appreciation and
understanding for .NET internals and be able to apply your knowledge in
practical, real-world scenarios.
The first thing you need to understand and never forget is that you
won't go very far if you try to work against the JIT compiler. The compiler
represents many hours of tuning and testing to assure that it does a great
job of optimizing your code. So rather than work against it, it is important
to understand exactly what it does and how to work with it. The primary
purpose of the JIT compiler is to turn your IL code into native code that
runs on a specific machine. The JIT compiler will make a compilation only
when a method is executed. This is actually one of several ways that the JIT
compiler helps performance; it doesn't waste time compiling code that is not
being executed. Further, once a method is compiled, it is not recompiled
during the lifetime of your application's execution. Here is what happens:
1. Your program is loaded and a function table is initialized. The
function table will be populated with indirect function calls pointing to
the native instructions.
2. Your program's Main() method is compiled by the JIT and the function
table is updated.
3. Whenever a method is called in your program, the JIT will look to see
if the code has been compiled. If it has, it will be executed immediately;
otherwise it will be compiled and the function table is updated.
Aside from the per-method compilation that is performed by the JIT,
there are several other optimizations that you get for free. I've listed
them for you in Table 1 so you can understand how they are applied, as well
as the effect they have on your code. You shouldn't fret much about the JIT
compiler doing things that hinder performance; in actuality it does a tremendous amount without any real effort on your part.
One argument often heard among performance hacks centers on the pros and
cons of using NGen.exe to precompile .NET code. I don't recommend
precompiling your code except in two specific instances, but more on that
later. First, let me explain the organic reason for my view. The JIT
compiler has been designed to take into account the environment in which it
runs, and some of the optimizations can only happen at runtime. For
instance, the JIT can compile for a specific compiler and optimize the
instruction set; NGEN (the Native Image Generator Utility) cannot do this.
Using the JIT allows the compiler to make aggressive use of inlining
functions, optimize indirection, and optimize across assemblies, none of
which can happen with the use of NGEN. So when do I think you should use
NGEN? I would suggest you use it if your application does a lot of upfront
initialization. For instance, if you are making a lot of method calls when
your application starts, remember that the JIT does per-method compilation;
you may want to reduce the startup time of your application by using NGEN. I
would advise using NGEN when your application uses a large number of shared
libraries. In this case using per-method compilation could lead to a
performance decrease.
Threading for Performance
A huge mistake that many people make is assuming that more of a good
thing is always better. So if one thread is running well, the assumption is
often made that more threads will run better. This is not usually the case,
and regardless of the science, you should consider whether you should use
threading and only use it if it is truly required. Once you know whether you
actually need threads, the next decision is whether you should manage the
threading or allow the .NET CLR to handle it. As with the JIT compiler,
thread management under .NET is optimized rather well. For instance, thread
blocking in managed code is automatically detected and the situation can be
adjusted. But in some cases such as those where you need to guarantee the
service level of a thread or you need to have a long-running task you may
wish to manage your own threads.
The key to achieving maximum performance with threading is to recycle
threads. Threads are objects, and their instantiation is costly. So if you
create a new thread for each request, you will incur the cost of creating
and initializing the thread, but if you use an existing it, service a
request, you will not incur the creation/initialization costs, improving
performance. Much of this will be handled for you by the thread pool, which
is one of the key benefits of using .NET threading services.
So the take-away on threading and performance is to use the .NET
threading services for short-lived nonblocking operations and for situations
where you have longer-running operations that can benefit from a managed
thread. This is clearly an area where understanding the science of threading
and the art of performance is critical.
Taking Out the Trash Quickly
The garbage collector (GC) in .NET makes memory and resource management
rather simple. But if you are truly interested in performance, you need to get a handle on how the GC can trip up your application's performance. First, let's review the benefits of the GC and how it works.
The GC uses a mark-and-compact approach that is generational. The .NET
GC uses generations to help optimize the collection process and free up
resources. Generation 0 contains recently created, frequently accessed
objects. Generations 1 and 2, which hold larger and less frequently accessed
objects, are not collected as often as Generation 0. The GC will sweep
through Generation 0 rather often and free up resources there, but when
doing the sweep of Generation 0 it ignores Generations 1 and 2. This means
that any objects you have that are using large, expensive resources, such as
file and operating system resources or network and database connections,
could adversely impact your application's performance by holding on to
precious system resources longer than needed.
So one way to improve performance is to consider your object designs and, if you have resources being held by large, infrequently used objects, take steps to release them sooner than later. One way to do this is to implement a Dispose() method on all
objects that consume expensive resources that may not be collected and freed
as part of Generation 0.
Dispose() vs Finalize()
Freeing resources is an important part of improving performance. The
decision as to which method to use to free a resource Dispose() or
Finalize() is really not all that complicated. The benefits of Dispose()
are that it is controlled by the programmer, and resources are freed upon
completion of the method. With Finalize(), the GC calls the method, but
there is no order or predictability as to when the GC will call the
Finalize() method. Last, using Finalize() is a two-step process. During the
first pass through the generation, the GC will mark the object to be freed,
and then on the next pass it will be collected and destroyed. So keep in
mind that with Finalize(), your resources will be kept alive for at least
two passes of the GC, and depending on what generation your object lives in,
collection could be later than sooner.
Does this mean you should always implement Dispose() over Finalize()?
Not institute a Finalize() method at all? No, it means that you again need
to balance the art of performance tuning with the science. Deciding whether
or not to use one of these methods is really something you will need to
evaluate on a case-by-case basis. Also, keep in mind that if you do
implement Dispose(), consumers of your object must be aware that they should
be calling the method, or your efforts will be for naught.
Exceptional Performance
Throwing exceptions is another area that can lead to performance
degradation. It is important that you consider the use of exceptions and how
they affect your application. Please understand that I am not advocating
that you do not use "Try...Catch" blocks in your code, but rather that you
understand when exceptions are thrown and how they impact your code.
Exceptions, by sheer nature, are expensive operations. Developers will often
use them to control the flow of their program or as a poor man's
communication mechanism, for instance, branching code based on an exception
or throwing an exception to communicate an event instead of raising an
event. If you are doing this, all I can say is don't.
If you aren't sure if your application is "exception heavy," try using
the .NET performance counters to see how many exceptions your application is
throwing. You should be aware that sometimes you can't avoid throwing an
exception, for instance invoking a redirection using "Response.Redirect()"
throws an exception, and there are other .NET operations that do the same
thing. Chances are you will not be able to work around every instance, but
knowing is always half the battle when it comes to performance tuning. Also,
be wary of integration with unmanaged code in which things like COM objects
and System API calls can throw exceptions that can impact performance.
Good Network Neighbors
Another area in which performance can become critical is in distributed applications where you are using either .NET Remoting or another network programming approach. Here the ability to achieve better performance is often a result of how you balance
the work to be performed by the client and the server. You will need to
decide if it is better to process something at the client or to package and
send it across the wire. When communicating across the network be sure to
minimize the number of calls and be very careful about methods that end up
blocking. Each area we have already examined will be magnified when working
in a network scenario and needs to be considered very closely.
Get Chunky
When making method calls on remote objects or services, consider using
as few method calls as possible and avoid the use of properties as much as
possible. You will dramatically increase performance if you can package the
data you need to send across the wire while minimizing the number of method
calls. But beware that often if you send more data across the wire you will
also increase the processing time on the recipient of the data, which will
need to manage the incoming data.
Go Native
The use of native data types will help minimize the expense associated
with marshaling. You incur expense, and thereby degrade performance, when
you create situations where data translation is required. For instance,
moving data from ASCII to Unicode or in some cases from XML to another
format can create expensive marshaling scenarios. While planning your
application you can drastically improve performance if your development team
agrees on how you will manage data between your client and server objects.
Use the Right GC
The .NET CLR has two garbage collectors, the workstation version
(mscorwks.dll) and the server version (mscorsvr.dll). The server version is
optimized for throughput and is also more aggressive about collections, so
it minimizes memory fragmentation, takes advantage of multiple processors,
and can also support multiple heaps. The workstation version minimizes
latency and can also recognize multiple processors, but is best used in a
single-processor workstation scenario. In some cases you may be running your
client objects on a multiprocessor workstation communicating with objects on
a server. In this case you can force the workstation to use the server
version of the GC so that you can reap the benefits of the server version
even though you are running on a workstation.
Security
The last area we will examine is the impact security has on performance.
I left it for last because I honestly feel that security is an area where
you need to be very careful that your quest for speed doesn't leave you
exposed. So as you consider the optimizations presented here, test and
retest to be certain you have done nothing to compromise your application's
security.
Security affects performance, but with that said, .NET security has been
developed to meet the performance needs of most developers. If you still
need to get that last bit of performance out of your application, here are
some things to consider.
.NET security is optimized and has several techniques it uses to
minimize impact on performance, but there are situations where a security
check on a method will cause a walk of the stack. One thing to do is
minimize the stack walk. Yes, .NET does this on its own by using declarative
security instead of imperative. When PermitOnly, Deny, or Assert are
declared, you can avoid the stack walk. Another option is to do as many of
your security checks at link time instead of runtime, using "LinkDemand" to
do code checks, as opposed to identity checks.
Just Getting Started
The reality is that this is just the beginning. There is so much more to
review and learn, such as the effect of boxing on performance (hint: examine
your IDL and try to find a better design when you see Box/UnBox instructions), the use of ValueTypes, and looping considerations. I plan to cover these issues and also dive much deeper into plan threading, the GC, security, and the CLR in future articles. I hope the information presented here will provide a strong foundation for further exploration.
About The Author
John Gomez is CEO and chief scientist at Group Espada, a company specializing in advanced .NET development, training, cyber-security, and counter-hacking. When John isn't
running Group Espada, he enjoys spending time with the hobbits and elves of the .NET netherworld.
jgomez@groupespada.com
All Rights Reserved
Copyright © 2004 SYS-CON Media, Inc.
E-mail:
info@sys-con.com