Are you aware that you might be shipping your source code with your .NET
dll or exe? A new tool included in Microsoft's Visual Studio .NET 2003 can
help you make sure that does not happen.
The .NET platform realizes Microsoft's vision for the next paradigm in
Windows computing: multiple programming languages interacting harmoniously,
sharing an enriched object-based framework, and executed by a Common
Language Runtime (CLR). This architecture provides an unprecedented degree
of power and flexibility. Unfortunately, that flexible design inherently
produces a problem for those wishing to hide their program's intellectual
property. Programs in the .NET Framework are easy to reverse engineer. This
is not in any way a fault in the design of .NET; it is simply a reality of
modern, intermediate-compiled languages (Java suffers from this problem
too). Both Java and .NET use expressive file syntax for delivery of
executable code: bytecode in the case of Java, MSIL (Microsoft Intermediate
Language) for .NET. Being much higher-level than binary machine code, the
intermediate files are laden with identifiers and algorithms that are
immediately observable and ultimately understandable. After all, it is
obviously difficult to make something easy to understand, flexible, and
extensible while simultaneously hiding its crucial details.
So anyone with a copy of ILDASM or better yet, one of the commercial
.NET decompilers can look at your assemblies and reverse engineer your
source code. Suddenly, your software licensing code, copy protection mechanisms, and
proprietary business logic are much more available for all to see whether
it's legal or not. Anyone can peruse the details of your software for
whatever reason. They can search for security flaws to exploit, steal unique
ideas, crack programs, etc. This should be enough to make you pause for thought.
All of that said, it should not be considered a risk or a showstopper.
Organizations concerned about putting their intellectual property on the
.NET Platform need to understand that there is a solution to help thwart
reverse engineering. Obfuscation is a technique that provides for seamless
renaming of symbols in assemblies, as well as other tricks to foil
decompilers. Properly applied, obfuscation can increase the protection
against decompilation by many orders of magnitude, while leaving the
application intact. Obfuscation is commonly used in Java environments and
for years has helped companies feel safe about protecting their intellectual
property when they release their Java-based products.
As you'd expect, Microsoft isn't passively watching as this issue
develops. As of Visual Studio .NET 2003, they're including a "lite" version
of PreEmptive Solutions' Dotfuscator, accessible from the toolbar. Microsoft
is known for treating developers like important customers (which they are),
and they're not missing the boat on this either. They are providing a
solution right out of the box. This article delves into the world of .NET
obfuscation. Along the way, you will develop an understanding of how
obfuscation is successfully applied.
Background
Obfuscation is the technology of shrouding the facts. It's not
encryption, but in the context of .NET (or Java) code, it might be better.
Early in Java's life, several companies produced encrypting class loaders to
fully encrypt Java classes. Decryption was done just in time, prior to execution. Although it made classes completely unreadable, this methodology suffered from a classic encryption flaw; it needed to keep the decryption key with the encrypted data.
Therefore, an automated utility could be created to decrypt the code and put
it out to disk. Once that happens, the fully unencrypted, unobfuscated code
is in plain view.
As another illustration, you could compare encryption to locking a
six-item meal into a lockbox. Only the intended diner (i.e., the Common
Language Runtime) has the key, and we don't want anyone else to know what he
or she is going to eat. Unfortunately, if someone can pick the lock (or find
the key hidden on the bottom of the box), the food is in plain view. Obfus-
cation works more like putting the six-item meal into a blender and sending
it to the diner in a baggie. Sure everyone can see the food in transit, but
besides a lucky pea or some beef-colored goop, they don't know what the
original meal is. The diner still gets the intended delivery and the meal
still provides the same nutritional value it did before (luckily, CLRs
aren't picky about taste). The trick of an obfuscator is to confuse
observers, while still giving CLRs the same delivery.
Without argument, obfuscation (or even encryption) is not 100%
protection. Even compiled C++ is disassembleable. If a hacker is persistent
enough, he or she can find the meaning of your code. Also, humans write and
employ decompilers to automate decompilation algorithms that are too
challenging for the mind to follow. It is safe to say that any obfuscator
that confuses a decompiler will pose even more of a deterrence to a
less-capable human attempting the same undertaking. The goal of obfuscation
is to form a barrier that knocks out as many would-be reverse engineers as
possible by creating confusion.
As confusion builds, the ability of the human mind to comprehend
multifaceted intellectual concepts deteriorates. Note that this precept says
nothing about altering the forward (executable) logic only representing it
incomprehensibly. When an obfuscator goes to work on readable program
instructions, a possible side effect is that the output will not only
confuse a human interpreter, it will stop a decompiler. While the forward
logic has been preserved, the reverse semantics have been rendered
nondeterministic. As a result, any attempt to reverse engineer the
instructions into a "programming dialect" like C# or VB will likely fail
because the translation is ambiguous. Deep obfuscation creates a myriad of
decompilation possibilities, some of which might produce incorrect logic if
recompiled. The decompiler, as a computing machine, has no way of knowing
which of the possibilities could be recompiled with valid semantics.
Issues
The obvious concern getting the most buzz in .NET developer circles is
the threat of intellectual property theft. We hear this discussed at
conferences and see it as a forum topic in online newsgroups. The developer
community is concerned for good reason. They intend to produce commercial
Windows software with .NET and this is a very competitive industry. The
barriers to entry are low. Anyone with skill, hardware, and some basic tools
can begin to create programs that have the potential to enter the
competitive arena. For reasons just explained, .NET introduces the
possibility that competitors can inspect your code. Even if they don't copy
it outright, they can certainly glean algorithms and constructs useful to
their own endeavors, leaving you holding the bag.
A less obvious effect of MSIL readability is the exhibition of
confidential constructs such as your software licensing, copy protection, or
encryption code. The problem here is more subtle, but equally perilous. By
exposing your security logic to the public, you are giving them a roadmap to
cracking your algorithms.
The third issue is that of code bloat. .NET is fully object oriented.
The world has come to a place that accepts this as the programming paradigm
of choice no argument there. One of the benefits of OOP is the ability to
use class libraries to quickly bypass the development of tedious "plumbing"
code. Instead, developers inherit from a coordinated set of classes that
have been tested and offer a rich palette of functionality. In fact, this
set might be richer than we need for a given application. Where does all
that extra functionality go when you compile? It goes right into your
application code. As post-compilation tools, obfuscators are in the perfect
position to help us with this bloat. High-end obfuscators are available that
remove unused code as a by-product of their multipass analysis. This expands
the role of obfuscator to include that of code sizereducer.
The Basic Solution
Today, some commercial obfuscators employ a renaming technique that
applies trivial identifiers. Typically, these can be as short as a single
character. As the obfuscator processes the code, it selects the next
available trivial identifier for substitution. This seemingly simple
renaming scheme has a huge advantage over hashing or character-set offset:
it cannot be reversed. While the program logic is preserved, the names
become nonsense. At this point, it has hampered human understanding to a
large degree. Faced with identifiers like a, t.bb(), ct, and 2s(e4), it is a
stretch to translate the semantic purpose to be concepts like invoiceID,
address.print(), userName, and deposit(amount). Nevertheless, the program
logic can be reverse engineered.
A deeper form of obfuscation uses Overload Induction, a patented
algorithm devised by PreEmptive Solutions, Inc. (this scheme is included in
the Visual Studio version). Trivial renaming is used; however, a crafty
twist is added. Method identifiers are maximally overloaded after an
exhaustive scope analysis. Instead of substituting one new name for each old
name, Overload Induction will rename as many methods as possible to the same
name. After this deep obfuscation, the logic, while not destroyed, is beyond
comprehension. See for yourself. The simple example shown in Listings 1 and 2 gives you some idea of the power of the Overload Induction technique:
One of the things you probably noticed about the example is that the
obfuscated code is more compact. A positive side effect of renaming is size
reduction. For example, if you have a name that is 20 characters long,
renaming it to a() saves a lot of space (specifically 19 characters). This
also saves space by conserving string heap entries. Renaming everything to
"a" means that "a" is stored only once, and each method or field renamed to
"a" can point to it. Overload Induction enhances this effect because the
shortest identifiers are continually reused. Typically, an Overload Induced
project will have up to 35% of the methods renamed to a().
Obfuscators remove debug information and nonessential metadata from an
MSIL file as they process it. Aside from enhancing protection and security,
this also contributes to the size reduction of MSIL files.
It is important to understand that obfuscation is a process that is
applied to compiled MSIL code, not source code. Your development environment
and tools will not change to accommodate renaming. Source code is never
altered, or even read, in any way. Obfuscated MSIL code is functionally
equivalent to traditional MSIL code and will execute on the CLR with
identical results. (The reverse, however, is not true. Even if it were
possible to decompile strongly obfuscated MSIL, it would have significant
semantic disparities when compared to the original source code.) Figure 1
shows the flow of the classic obfuscation process.
Solution Enhancements
One of the more advanced obfuscation techniques available today is
Control-Flow obfuscation. This process synthesizes branching, conditional,
and iterative constructs that produce valid forward logic, but yield
nondeterministic semantic results when decompilation is attempted. All of
the admonishments you ever heard about maintaining spaghetti code are
working in your favor when you try to protect your intellectual property
using Control-Flow obfuscation. Consider trying to understand the code in
Listings 3 and 4 before and after Control-Flow obfuscation. It should be obvious that after Control-Flow obfuscation the reverse engineered code is very ugly at worst and incorrect (not recompilable) at best.
Another technique, string encryption, applies a simple encryption
algorithm to any strings in your application that you desire. As mentioned
before, any encryption (or specifically decryption) done at runtime is
inherently insecure. That is, a smart hacker can eventually break it, but
for strings present in customer code, it is worthwhile. Let's face it; if
hackers want to get into your code, they don't blindly start searching
renamed types. They probably do a search for "Invalid License Key", which
points right to the code where license handling is performed. Searching on
strings is incredibly easy. String encryption raises the bar for the casual
hacker and deters that many more nonserious hackers. The algorithm typically
incurs a tiny performance penalty at runtime, so make sure the option is
fully configurable.
An advanced feature called incremental obfuscation is of particular
interest to enterprise development teams maintaining an integrated
application environment. By generating name-mapping records during an
obfuscation run, obfuscated API names can be reapplied and preserved in
successive runs. A partial build can be done with the full expectation that
its access points will be renamed the same as a prior build. As a result,
the distributed patch files integrate into the previously deployed system
without a hitch.
Last, obfuscators can accomplish size reduction by analyzing your
application and removing code your program is not using. It seems odd that
unused-code removal can actually do anything who writes code they don't
use? Well, the answer is all of us. What's more, we all use libraries and
types written by other people that were written to be reusable. Reusable
code implies there is contingent code that handles many cases however, in
any given application you typically only use one or two of those many cases.
An advanced obfuscator can figure that out and rip out all the unused code
(from compiled MSIL, not the source). The result is that the output contains
precisely the types and methods your application needs, nothing more.
Amazing space reduction can be achieved, conserving computing resources and
reducing instantiation times. This can be especially important for .NET
Compact Framework or remotely deployed applications.
Conclusion
Microsoft's .NET Framework provides one of the best software development
platforms available today. Expect all Windows developers (and even some
non-Windows developers) to eventually make the switch to .NET. Given this
reality, the next step is to address any concerns you might have about
protecting your code from reverse engineering. Obviously, this need not be
considered a risk or a showstopper; the problem is solved. To get started
using an obfuscator, consider downloading a free copy of Dotfuscator
Community Edition at www.preemptive.com/dotfuscator or use it right from the
Tools menu of Microsoft's Visual Studio .Net 2003 (see Figure 2). Should you
want more powerful obfuscation and size reduction, you can upgrade to
PreEmptive's Dotfuscator Professional Edition. You may never know what an
obfuscator is worth unless you do not use one!
Author Bio
Gabriel Torok is a founding principal at PreEmptive Solutions, Inc. He is a
book author and active national conference speaker. He is directly involved
in most aspects of the business, focusing primarily on product development,
and sales and marketing. In addition to company management, Gabriel remains
active in teaching Java, .NET, and related technologies.
gtorok@preemptive.com
Listing 1
private void CalcPayroll(SpecialList employeeGroup) {
while (employeeGroup.HasMore()) {
employee = employeeGroup.GetNext(true);
employee.UpdateSalary();
DistributeCheck(employee);
}
}
Listing 2
private void a(a b) {
while (b.a()) {
a = b.a(true);
a.a();
a(a);
}
}
Listing 3
public int CompareTo( object o ) {
Frequency f = o as Frequency;
if ( f == null )
return -1;
if ( m_Comparer == null )
return m_Letter.CompareTo(f.Letter);
return m_Comparer.Compare(this,o);
}
Listing 4
public virtual int a(object A_0) {
g local0;
int local1;
local0 = A_0 as g;
if (local0 != null)
goto i1;
goto i2;
while (true) {
local1 = this.a.CompareTo(local0.c());
goto i3;
i1: if (g.c != null)
goto i4;
}
i2: local1 = -1;
goto i3;
i4: local1 = g.c.Compare(this, A_0);
i3: return local1;
}
All Rights Reserved
Copyright © 2004 SYS-CON Media, Inc.
E-mail:
info@sys-con.com