HomeDigital EditionSearch Dotnet Cd
ASP.NET C# Certification Exams The CLI Data Access Editorials Extending .NET Fundamentals Interoperability Interviews Migrate Mobile .NET Mono .NET Interface Object-Oriented Programming Open Source Optimization Product/Book Reviews Security Source Code UML Visual Studio .NET

The Science of Threading
Part 1: An overview of a little-understood area

If I had to bet on what is the least-understood - yet most often viewed as a cure-all - area of software engineering, I would have to place my money on threads. The topic of threading, in my opinion, causes a tremendous amount of confusion and is typically implemented in situations where there is no need for the use of threads. Having said that, if you can gain a healthy understanding of threads, specifically of how they are implemented in .NET, you can reduce their complexity and apply them in a manner that is sensible, practical, and most of all, efficient.

Threading Rules
Let's establish some ground rules regarding the use of threads in any application. First, you should realize that by utilizing threads you drastically increase your debug and QA cycle. Threads are difficult to debug, and testing them to see if they break your design requires a tremendous amount of trial and error. You should make sure that your test cases go far beyond your standard testing. In fact you should design test cases that specifically challenge the implementation of your threads.

Second, don't think that implementing threads will automatically increase performance or scalability. Threads can cause a variety of side effects that can actually depreciate your application's performance and scalability. For instance, it's possible to institute a thread design that causes requests to block or creates a situation in which resources are consumed but not released.

Finally, make sure you understand that threads can behave differently on different hardware configurations, such as single- versus multiprocessor systems. Suffice it to say that it is important that you clearly think through your use of threads. I would highly recommend that you exhaust other technical options before you implement threads in your system.

Threading Basics
Although threading can be a difficult topic to grasp, .NET does make its implementation rather straightforward. This is especially true if you compare the threading model in .NET to previous incarnations such as Windows threads or COM+ threading. In .NET a thread is an object, and as such, the way in which you interact with, create, and deal with threads is very similar to how you would interact with any other object in .NET. This is true at least for managed threads - basically those threads native to .NET - as opposed to unmanaged threads. Unmanaged threads are threads that are created outside of the .NET world - in other words they don't come under the control of the CLR. In this article we will focus on managed threads specifically.

A good and general definition of a thread is "a lightweight process" that runs inside of what is known as the AppDomain. Now before it gets too confusing, an AppDomain also can be defined as a "lightweight process," the difference being that an AppDomain establishes a boundary and provides some degree of isolation from other lightweight processes for such things as security, errors, and faults. A thread lives within the AppDomain but can also execute across AppDomain boundaries. Just as the AppDomain has several properties that you can set or interact with, so does a thread.

A thread has several properties that you can set once it is created. Some, such as the thread name are optional, but others like a thread's identify or hash code are set automatically and cannot be changed. Other properties, such as a thread's priority, have a default value but can also be changed by your program or the CLR. Similar to other objects, a thread has methods that allow you to interact with it; for instance, you can start a thread or put a thread to sleep. This combination of properties and methods allows you to have a high degree of control over threads. However, this control comes at a high cost, which is that you must be certain that you have a solid design and that you implement threads correctly.

One way .NET makes your life easier is by providing you with an alternative means by which to interact with and utilize threads. This alternative, known as the "thread pool," allows you to quickly access the services of a thread, but you lose some of the fine-grained control of implementing and managing your own threads from scratch.

Pooled Threads
Under the covers pooled threads are no different than nonpooled threads. The overall idea behind the thread pool is that you have "some" number of threads that are precreated for you and available to your application so you don't incur the overhead of creating them. This can help with performance and reduce the complexity of managing the threads yourself. But since the pool has only so many threads, if you request a thread from the pool when one is not available, your application will block until the request can be serviced. Neither can you "pin" a thread in the pool. This means that if you must perform a set of events on a single thread, there is no way to guarantee that subsequent events will be executed on the same thread.

Remember that the reason any pool is efficient is because it can quickly utilize a block of resources to address a request and then move on to the next request. To do this the pool must be able to "round-robin" the threads in the pool. This also means that you cannot do things like set the priority of a thread, name a thread, or perform other thread-specific operations on a thread that belongs to the thread pool.

With all these perceived limitations, why would anyone use a thread pool? The services of a thread pool are more than adequate for most applications. The idea that you can obtain a thread from a pool in much the same way that you obtain a database connection from a pool is rather powerful. But use of thread pooling doesn't mean that you are free from the design considerations or testing challenges posed by any multithreaded program.

Now that you have a basic idea of why the thread pool exists in .NET, it might be a good time to start digging into some code. But before we do we need to take a very brief detour and gain a basic understanding of ".NET delegates." To be fair, delegates are an entire topic in and of themselves, and if you are going to truly understand the nuances of threading, you really need a thorough understanding of delegates. But for the purposes of this article we'll keep things at a high level.

A delegate is basically a prototype of a function; in some circles a delegate is known as a contract. The delegate establishes a contract or prototype that details the semantics of a function, for instance:

public delegate void SayHello(string message);

The "delegate" keyword defines the base signature or contract for the delegate, then you implement a candidate of the delegate, such as:

static private SayHelloCandidate(string message)
{
Console.WriteLine(message);
}

You will notice that that candidate has the same function signature as the delegate. Now all you have to do is implement your delegate:

SayHello sayHi = new SayHello(SayHelloCandidate);

Then execute the delegate:

sayHi("Hello My Friend");

When you call the sayHi program it will take the string and pass it to the SayHelloCandidate () method, which then writes "Hello My Friend" to the console. As I said, the use of delegates and the design principles associated with them is a rather involved topic. My hope here is to give you some basics so we can discuss threading and make sure we are all on the same page. Now that you have a basic understanding of delegates, you should be ready to move on to the next step in understanding the use of the thread pool in .NET.

To use a thread in the thread pool you use the ThreadPool class, which can support "worker" and "completion port" threads. Worker threads are simply threads that don't have anything to do with I/O. Completion port threads are a special class of threads that are used for I/O operations. The ThreadPool contains a set of Thread objects that are managed by the CLR. In order to pass some work to a thread in the thread pool, you establish a delegate and then pass the delegate to the ThreadPool. This is something worth repeating: you don't pass the delegate to a thread, but rather to the pool of threads. The CLR will determine which Thread object will run your request.

Let's assume we have a delegate that takes a comma-delimited string. Further assume we have a method that also takes a comma-delimited string and then walks the string to see if each port in the string is either open or closed. The code to construct the delegate and make the call to check the ports might look something like:

PortRequest netCheck = new PortMonitor(PortCheck);
netCheck("80,110,443"); //Is port 80 available?

What if we want to assign this work to a thread from the thread pool so that we could do something else while the ports are being scanned? Well, it's actually pretty straightforward. To access the threading services, you need to include the "System.Threading" namespace in your application. Once you do that you can then invoke ThreadPool by calling the QueueUserWorkItem method:

ThreadPool.QueueUserWorkItem(new WaitCallback(netCheck("80,110,443")),this)

QueueUserWorkItem takes a "Callback" object, to which you pass your delegate. Under the covers the CLR will automatically assign the delegate to an available thread and execute the work. But how do you know when your work is completed? Actually, you don't; you have to either poll to see if the work was completed or else develop a design that doesn't require knowing when the unit of work is completed. The basic assumption is that tasks handled by the ThreadPool class are asynchronous. Typically, the thread pool will create about 25 threads (I say "typically" because this number could change with future releases of .NET). To understand how many threads can be placed in the thread pool you can use the following code:

int totalWorkerThreads;
int totalPortThreads;

ThreadPool.GetMaxThreads(out totalWorkerThreads,out totalPortThreads);

This code will return the total threads that are in the thread pool. This is not the same as how many threads are available to do work at any given time. To find out how many threads are available to do work you can use:

ThreadPool.GetAvailableThreads(out totalWorkThreads,out totalPortThreads);

For clarity, the number of "available" threads should either be less than or equal to the "maximum" number of threads in the system. One clever trick - that I don't recommend - is to query for the number of available threads and if it is below a certain threshold, to add logic to your system that utilizes nonthread pool threads. Why don't I recommend this approach? One of the key tenets of a successful threading architecture is to keep it simple and safe. This approach is very difficult to debug, and just begs for problems when you need to figure out whether an anomaly is due to the thread pool or your own threads. So you can't say I didn't warn you.

When is it helpful to know about the maximum number of threads and the available number threads? These are wonderful methods for debugging and optimization tasks. If some part of your application is having intermittent performance or scalability issues and you are using the thread pool, it could mean that your application is suffering from the dreaded "thread starvation" syndrome. Thread starvation occurs when there aren't enough threads to service your application in a timely manner and requests are queued up awaiting service.

If your application is suffering from thread starvation you have several options. First, you should figure out if the units of work you are asking to be completed by the thread pool are taking too long and causing threads to not return quickly. The thread pool should be used to perform "short-running" operations. The threads in the thread pool are foreground threads - meaning they run at a higher priority - and are designed to service "short-lived" units of work. If you feel your units of work are such that they are efficient and not "long-lived," then make sure they aren't blocking. Second, you can instrument your application so that in a debug mode it can write out the number of maximum and available threads before your delegate is passed to the thread pool for attention. You may be able to determine if a pattern exists that correlates to your performance problems.

Until Next Time
At this point you should have a general sense of threading, an understanding of the difference between worker and completion port threads, a basic understanding of delegates, and at least the key concepts required to interact with the thread pool. You should also know what the thread pool is and how it can help you implement multithreaded programs without the need for you to manage your own threads. I have also tried to give you some guidance on what to look for if you suspect that your thread design is causing your performance or scalability problems. Remember that simply implementing threads does not mean your application will automatically scale or perform much better.

Next month we will continue to explore the science of threading. Specifically, we will concentrate on the utilization of threads we create without the help of the thread pool, examine what happens under the covers in relation to threading, and deal with such neat things as thread synchronization and thread local storage. We will also spend a lot more time working through code and examining thread safety.

Author Bio
John Gomez, open source editor for .NET Developer's Journal, has over 25 years of software development and architectural experience, and is considered a leader in the design of highly distributed transaction systems. His interests include chaos- and fuzzy-based systems, self-healing and self-reliant systems, and offensive security technologies, as well as artificial intelligence. John started developing software at age 9 and is currently the CTO of Eclipsys Corporation, a worldwide leader in hospital and physician information systems. john.gomez@eclipsys.com

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.

  E-mail: info@sys-con.com