Enter the Data Definition Language
A developer perspective
Most developers have faced
situations in which they
need to read data from a
file of some given format
but can find no real library routines
for this purpose, other than the
standard file-handling libraries. In
such cases, they would have to develop
code to calculate the addresses of
various values they want to retrieve
from the file, plus write file-handling
code to retrieve the data. While this is
probably feasible for simple file types,
the need to support a complex file
type, multiple file types, or a changing
file type can easily become a development
bottleneck.
Our new language for the .NET
platform, the Data Definition
Language (DDL), is a unique language
that aims to solve this common problem.
We hope that by the time you
have finished reading this article, you
will want to download the DDL and
try it for yourself (and drop us mail
about it)
At one time we worked with a
development team that was creating
an application to analyze data collected
in black boxes from aircraft. Data
dumps from black boxes have very
complex formats. The team needed to
read values of various aircraft parameters
such as left aileron angle, right
engine exhaust temperature, etc.,
which were scattered across various
bits that seemed to have no respect for
word boundaries or any such thing. To
add to the confusion there was no
apparent standard shared between
black box formats of the same manufacturer,
versions of the same aircraft,
or any such thing.
The team needed their product to
support multiple black box formats
without a huge development cycle
between them. A little investigation
made it apparent that there was no
ready solution for supporting data
retrieval from arbitrary file formats.
This is the problem that the DDL was
designed to solve.
How does the DDL understand the
format of the file you want to read
data from? The DDL is a language
designed to be simple and intuitive for
expressing data formats. To develop a
solution with the DDL, a developer
needs to first write a DDL script that
represents the file format. Once the
script is developed, you can run the
DDL engine/interpreter on the script,
provide the engine with your data file,
and you are ready to start reading
information from the file.
So how can you use the DDL in
developing an application? The DDL
engine is designed to be an interpreter
that can be hosted by any .NET application
– which means that your
VB.NET or C# programs can act as
hosts for the DDL. A typical DDL solution
(see Figure 1) would consist of a
parent application that contains the
user interface and the business logic
and that hosts the DDL engine, with
which you can programmatically
interact. The DDL engine simply acts
as a substitute for your complex filehandling
code; it does not dictate
what you do with the data that you
have retrieved.
Figure 1
Once you have hosted the DDL
engine, you can tell it to do tasks such
as "load this DDL script file", to make
it understand your file format. Then
you can tell it to "use this data file" so
that it can apply your DDL script
(which defines the data format) of the
actual data file. Finally, you tell it "give
me the value of left aileron angle" and
the DDL engine looks at the script,
understands where in the data file "left
aileron angle" would exist, and reads
out the value for your host application.
What Are DDL Script Files Like?
The DDL language provides you
with primitive types that represent
bits; these range from single-bit definitions
to 32-bit definitions. They are
named i1, i2, i3, etc. (to i32). Variables
can be declared to represent any of these types.
i16 width
This represents a 16-bit value
called width. Declarations such as this
should be grouped into DDL structures
similar to the following:
struct Window
{
i16 width
i16 height
}
A DDL script file can contain multiple
structures like the one above.
Similar to the declaration of primitive
bit types you could also define a structure
to contain a member of another structure type.
struct Bitmap
{
i8 BitsPerPixel
Window w
}
The DDL script requires that you
mark one structure in the script file as
the "init" structure. The DDL uses the
init structure as the first structure of
your data file; the init structure is
mapped to offset zero of your data file.
The init structure and its members
(which may be instances of some of
the other structures declared in the
script file) are expected to represent
the entire data file.
Figure 2
The DDL structures are different
in concept from structures in languages
such as C. The first difference
is that a DDL structure can
contain members depending on
conditions. Second, a member in a
DDL structure simply represents a
region in the data file, e.g., an i8-
type member would represent a
byte-sized region. Members in a
structure can have their offset
addresses from the base address of
the structure automatically calculated
or can have explicit addresses
provided. The following script
demonstrates both of these.
struct EmployeeData
{
i32 empNumber
i1 fPhoneNumberProvided
i7 fUndefined
when( fPhoneNumberProvided == 1 )
{
i8[10] phoneNumberString
};
@ 0,0 i24 empSerialNumber
i8 empDesigCode
}
This simple script demonstrates
some interesting things. An instance
of the EmployeeData structure will
contain a member called phone
NumberString of 10 bytes, only if
the fPhoneNumberProvided flag bit
is set. Similarly the notation "@ 0,0"
is an explicit address specification
that causes the address of the declaration
immediately following it to
fall at an offset of 0 bits from the
start of the structure. Thus, with
respect to the little-endian Intel
architecture, the lower 3 bytes of
empNumber are the empSerialNumber and the most significant
byte stands for the employee's designation code.
Figure 3
The script also demonstrates a
simple array of i8 (or byte) type of
10 elements. Unlike with C, the
size of an array can be specified
via an expression. To illustrate the
power of arrays, consider the following
snippet, which uses the
employee structure shown earlier.
struct EmployeeHeader : init
{
i16 empCount
EmployeeData [empCount] emps
}
The first member decides the
number of employees whose data is
provided and "emps" represents an
array of EmployeeData types. You
can now ask the DDL engine for the
designation code of the 5th employee,
and the DDL engine sets about determining
the location of the 5th
employee and retrieving its
"empDesigCode" value for you.
Remember that the EmployeeData
had a member that would occur conditionally;
this means that the size of
each EmployeeData instance can be
different. The DDL engine internally
determines that there is a dependency,
and checks the flag in each of the
preceding EmployeeData instances
to determine the actual location of
the 5th instance.
The DDL script provides only a
minimal set of programming constructs,
whose purpose is centered
on being able to define data formats.
The current version of the
DDL is rich enough to support a
wide variety of common file formats.
There may, however, be some
format types that may be difficult
or impossible to express in the DDL.
The DDL has constructs for
representing address specs, size
specs, conditional dependencies,
different kinds of array constructs,
etc. This is just a brief description
of the language; the complete language
description document is
available from the home page.
Hosting the DDL Interpreter
This is probably the simplest
part. Imagine that EmployeeHeader
and EmployeeData together formed
a DDL file called employee.ddl and
that we had a data file in this format
called empinfo.bin. The C# snippet
shown in Listing 1 is all you will
need to start using the DDL.
Listing 1 loads the DDL with the
script file and a data file and reads
values from it but does not show
any of the error-handling code that
would be required in a productionquality
application. One concept
you need to be familiar with to use
the DDL is that of path.
The init structure is represented
as "." (dot). Any member under it is
represented using its member
name. Any child of that member is
separated by a dot, and so on. If
there is an array in the path, then
the array instance is separated by a
":" (colon), for example:
init.emp[0] will be represented as ".emps:0"
At any point the GetValue() call
will return the values on any of the
variables in the current path. The
Seek() method is used to set the current
path to another location; subsequent
GetValue() calls will read
values from that location. Bit values
that are read are treated as unsigned
integer types. All values are returned
as "double" types.
A document describing the API
exposed by the DDL is available on
the home page for details.
The DDL engine is currently
available for download as System.
DDL.dll. This is a mixed-mode
.NET assembly developed in
Managed C++ and can be hosted in
any .NET application. The entire
source code of the DDL is also available
for download. The DDL is
offered for use free of cost and is
currently not under any licensing or
royalty restrictions. We, however,
expect that in return we will get
feedback that can help us improve
the DDL. If the DDL is used for
commercial purposes, we hope that
the authors will drop us a note and
possibly give credit where credit is
due; this would help in popularizing
the DDL. You are, however, not
required to do any of these and are
free to use the DDL without any
acknowledgment at all.
A console program is also available
that can be used to test run
DDL scripts you are developing. It is
also available in source form as an
example of hosting the DDL. The
site also offers tutorial material
about the DDL console, as well as
documentation about the language,
API, internal algorithms, and such.
Present and Future
The DDL in its .NET avatar is
currently in Beta 1 status and is the
work of two people. We believe that
the idea of a generally useful DDL
system has substance and are hoping
to work toward it.
For future development we are
hoping to strengthen the DDL language
so that it can be used to
express data formats that are currently
difficult or impossible to
express. Plans are also under way
for a DDL compiler. The compiler
will take a .ddl file as input and generate
a .NET assembly as output
that will be code streamlined for
your DDL script, rather than a
general purpose DDL interpreter.
We are hoping to build a community
around the DDL and would
like to invite you to join the DDL
development project. Input for
future design aspects, known issues,
etc., would be appreciated.
Resource
The DDL project home page:
http://ddl.sscli.net
Author Bios
Pooja Malpani is one of the youngest Microsoft MVPs in India for .NET. She has been working on .NET for three years and gives talks/seminars on .NET and Web services at user
groups/forums and universities. Her interests include algorithm design, language theory, and programming in general. Pooja is currently working with the .NET team of Cognizant
Technology Solutions, Bangalore, as a programmer analyst.
sdolly@sscli.net
Roshan James graduated from Model Engineering College in 2002 and is one of the youngest Microsoft MVPs in India. He works for Cognizant Technology Solutions' Microsoft Tech. Group at Bangalore. A math and physics buff who turned quasi-geek with a DOS box 7 years
back, he likes exploring operating systems, languages, compilers, and runtimes.
spark@sscli.net
Listing 1
using System;
using System.DDL;
class DDLTestClass
{
static void Main()
{
//initialize the DDL
ManagedDDLEngine ddl = new ManagedDDLEngine();
ddl.LoadSourceFile("employee.ddl");
ddl.OpenDataFile("empinfo.bin");
ddl.InterpretData(); //this call is needed to map
//the source to the data
Console.WriteLine("No of employees = {0}",
ddl.GetValue("empCount")); // GetValue() read the
//value of a variable
//contents of emps[0]
ddl.Seek(".emps:0"); //seek changes path
//into a member
Console.WriteLine("Data of 0th Employee \n\t "+
"empNumber={0}",
ddl.GetValue("empNumber"));
//contents of emps[5]
ddl.Seek(".emps:5");
Console.WriteLine("Data of 5th Employee \n\t "+
"empNumber={0} \n\t "+
"empDesigCode={1}, \n\t "+
"empSerialNumber={2}",
ddl.GetValue("empNumber"),
ddl.GetValue("empDesigCode"),
ddl.GetValue("empSerialNumber"));
ddl.Dispose(); //clean up
}
}
All Rights Reserved
Copyright © 2004 SYS-CON Media, Inc.
E-mail:
info@sys-con.com