HomeDigital EditionSearch Dotnet Cd
ASP.NET C# Certification Exams The CLI Data Access Editorials Extending .NET Fundamentals Interoperability Interviews Migrate Mobile .NET Mono .NET Interface Object-Oriented Programming Open Source Optimization Product/Book Reviews Security Source Code UML Visual Studio .NET

Microsoft's development tools are justly famous for their ease of use, and many developers have had the pleasure of impressing their boss with an exciting application that they knocked together in an afternoon. Many developers have also faced the disappointment of finding that the techniques they've read about and used for the single-user prototype don't easily translate to a multitiered, multiuser application.

ADO.NET is Microsoft's data access architecture that attempts to bridge the gap between the rapid application development techniques used on the afternoon project and the more structured approach required for a scalable production system. Does ADO.NET succeed in its goal? This article will show that it is does - but not straight out of the box. There are a number of problems to be overcome before ADO.NET can be used effectively as part of the middle tier of scalable, multiuser applications.

The problems are reconciling auto-numbered fields in a disconnected DataSet with identity fields on a database and ensuring a DataSet's changes are applied to the database in the correct sequence in order to maintain referential integrity.

ADO.NET
Before I describe the problems in more detail, I'll do a quick recap of what ADO.NET is and how it is structured.

The intention behind ADO.NET was to cleanly factor data access from data manipulation. This led Microsoft to the concept of a disconnected data set that can be sourced from either relational data or XML, or that can manage data local to the application. This means that the data set can be used in any tier of the application irrespective of the original data source.

The core of ADO.NET is the DataSet - an object model of relational data. The DataSet can be thought of as an abstraction of a relational database management system's (RDBMS) data model, but it contains a lot more than object representations of tables, rows, columns, and constraints. For one thing, it maintains detailed state information about updates made to the data set and provides mechanisms for committing or rolling back changes.

One of the reasons that RDBMSs have remained the mainstay of commercial software development is the sophisticated mechanisms they provide for maintaining data integrity. The ADO.NET designers have wisely built the DataSet on rapidly maturing XML technology and have leveraged the power of XSD to provide the constraint and type model underlying its implementation. Not only has this provided a well thought-out model for described relationships of strongly typed data - it has allowed the DataSet to have a relational and XML equivalence.

The maintaining of state and the XML/XSD equivalence give the DataSet extraordinary power and flexibility for a number of application scenarios. For example, the XML serialization of changes to a DataSet (known as a DiffGram) can provide the interface between layers of a distributed application communicating via text-based protocols (Web services being the obvious candidate). Changes made in either layer can be communicated across the interface and the changes can be merged into a local copy of the DataSet.

The DataSet is a model of data and is independent of a data source. The ADO.NET architecture uses the concept of data adapters to provide the interface between the tables of a DataSet and a data source. The data adapter likely to be used most often is the SqlDataAdapter. This, as its name suggests, is the interface between an RDBMS table and a DataSet table.

SQL data adapters contain up to four SQL command objects: a select command, update and insert commands, and a delete command. These commands hold details of the SQL required to populate a DataSet table or to update the RDBMS with changes to the table. Figure 1 illustrates the design of the DataSet and its associated DataAdapters.

Figure 1

The Middle Tier
So, how can the wonderful DataSet best be used in a layered architecture? To answer this question we need to examine the point of layered architectures.

Layered architectures have become increasingly popular with the growth of thin-client presentation channels and the increasing demands for multichannel-capable applications. The layered architecture is simply good software engineering practice in trying to assign major functional responsibilities to different layers, increase the cohesion within layers, and reduce the coupling between layers.

For the purposes of this discussion the typical three-layered application (shown in Figure 2) consisting of a presentation layer, a business logic layer, and data layer will be sufficient to illustrate the place of the ADO.NET DataSet in the architecture.

Figure 2

The data layer will most likely use an RDBMS to provide a normalized view of the application data. The presentation layer needs to display and update this data, but because of the layered approach is discouraged from accessing the data layer directly. Typically, the middle tier has a set of application objects that contain the business logic for the application and provide the presentation layer with an object interface to the data. In terms of good software engineering practice, this is a fine approach. In terms of delivering systems within tight deadlines, the approach has its problems.

Although it is always best to build an application on top of a stable data model, we all know this is not always possible. Like all the other components of a software system, the data model inevitably evolves as the application is developed, and as this happens, the layered model's middle tier becomes a bottleneck for change. Any changes made to the data schema need to be reflected in the object interface offered by the middle tier before the presentation layer components can react to the change.

The ADO.NET DataSet, or more precisely the typed DataSet, provides a way to square this circle: rapid application development (RAD), along with a structured, layered, and strongly typed system.

ADO.NET has another remarkable tool in its armory to assist the developer under pressure: the typed DataSet. The typed DataSet has two aspects: an XSD Schema defining the types and relationships between types that make up the DataSet, and an automatically generated class that provides a strongly typed interface to the tables, rows, and columns of the DataSet. The VS.NET development environment provides a tool for creating a typed DataSet. The tool allows users to drag tables from their database onto a canvas and create foreign key constraints between the tables, as well as to specify uniqueness and primary constraints of the data. The tool creates an XSD schema describing the XML representation of the table data that the user can manually edit, if desired. In short, the tool provides a user-friendly way of creating an XSD model of relational data. In addition, the tool analyzes the XSD schema and produces a typed DataSet class that is derived from the ADO.NET DataSet. Figure 3 shows how a typed DataSet for managing the Northwind product catalog can be created using the VS.NET DataSet Designer.

Figure 3

So ADO.NET provides a rapid application development technique for building a strongly typed object representation of relational data. This automatically generated set of business objects is an ideal candidate for the interface between the middle tier and the presentation layer. The responsibilities of the middle tier can be reduced to filling the appropriate typed DataSet and for persisting changes to typed DataSets back to the data layer. Figure 4 illustrates how the typed DataSet is populated by the middle tier and passed to the presentation layer.

Figure 4

The presentation layer can render the data to the user making use of data binding, if appropriate. The updated DataSet (or DiffGram of changes) would then be passed back to the middle tier business objects, which can update the database within a single transaction. The business objects have the opportunity to filter or amend data passed to the presentation layer and to validate the updated data according to their business rules.

An important point to grasp is that the typed DataSet is a view of the database. The tables and constraints that make it up do not have to correspond exactly to those of the database. The relational database's data model will have been normalized for efficiency and data integrity. This won't necessarily make it the best view of the application data for the presentation layer. The power of relational databases is twofold: their relational data model and the powerful SQL data access language. The ADO.NET DataSet, on the other hand, is not an RDBMS; it is a model of relational data, and as such it only offers simple mechanisms for iterating and accessing the data contained within it. When designing a typed DataSet, the developer should maintain the normalized view when appropriate and should produce joined views of multiple tables when this is more appropriate.

The ADO.NET DataSet offers excellent opportunity to use RAD techniques for building data models that provide strongly typed views of relational data to the different layers of a system. Moreover, the DataSet provides the facility for efficiently communicating complex changes between distributed layers, and its disconnected nature provides the ability for a system to support long-lived transactions. You can construct such a system without the need for a complex distributed-transaction infrastructure or compensatory transactional schemes. Complex changes made to a DataSet are persisted or cached in its serialized form before all of its changes are applied, in a single transaction, to the database.

Devil in the Details
So what's the problem? Like most things, the devil is in the details.

The first problem likely to be encountered is with SQL identity fields (or auto-number fields, as they are known in Microsoft Access databases). Good database design practice recommends having unique identifiers that are easily indexed. For example, a supplier's product may be identifiable by its product name, but a Product table in a relational database would usually have a primary key column implemented as an integer field. In multiuser applications the RDBMS would be assigned the responsibility of creating unique values for the primary identifier whenever a new product record is created. In other words, the primary identifier would be declared as an identity field.

If we are going to successfully use disconnected, typed DataSets as the data model for our layered architecture, we need to be able to work with identity fields.

ADO.NET DataSets provide a mechanism for creating unique identifier values. Columns in a typed DataSet can be declared as "auto-numbered" and each new record inserted into the DataSet will be automatically assigned a unique number for its auto-number fields. Clearly, in a multiuser system the auto-number fields in the disconnected DataSet are not going to be in step with the identity fields on the database.

The second problem that will be encountered with ADO.NET is the difficulty of maintaining database referential integrity while updating a database with the changes made to a DataSet. A DataSet can have updates, insertions, and deletions made to multiple tables, and, clearly, these changes need to be applied to the database in the correct order for foreign key constraints to be valid.

So much for the problems; what about the solution? The idea is this: for each typed DataSet used by an application, create a class responsible for filling the dataset and updating changes back to the database. Call this class a DataSet adapter to be reminiscent of the ADO.NET SqlDataAdapter responsible for filling a table within a DataSet and updating changes in the table back to the database. The typed DataSet adapter will manage a set of SqlDataAdapters, one for each table in the typed DataSet.

How does this help? Well apart from creating a structured data access layer for the middle tier to use, this approach will enable us to create a base class that implements routines for overcoming both the identity field problem and the referential integrity problem.

Figure 5 shows a UML representation of the design. A typed DataSet, MyTypedDataSet on the diagram, is created using VS.NET RAD tools. Next, a typed DataSet adapter is written (MyTypedData-SetAdapter) that sets up the SqlDataAdapters required to fill the DataSet and update its changes to the database. The typed DataSet adapter derives from the base helper class shown as DataSetAdapter on the diagram.With this structure in place, there only remains the small problem of writing the DataSetAdapter class that will implement the routines for solving the identity and referential integrity problems.

Figure 5

A Case of Mistaken Identity
The first problem to address is how to reconcile auto-number fields in a disconnected DataSet and the identity values for those fields on the database.

Consider an example from the Northwind example database. Imagine one facet of your application is allowing a user to configure a product catalog. In the Northwind database, Suppliers supply Products, and each product is assigned a Category. The primary key fields for each of the Products, Suppliers, and Categories tables are identity integer fields. Figure 6 shows the tables of the product catalog.

Figure 6

To build the catalog management application using the design outlined above we would create a typed DataSet for the product catalog (see Figure 3) and would create a typed DataSet adapter to fill and update the DataSet.

Now consider what interface the middle tier might offer to the presentation layer. Perhaps it would allow the adding, editing, and deleting of Products, Categories, and Suppliers. After just a few user interactions, the DataSet might contain new product rows with auto-numbered identifiers and foreign key relationships to new category rows and supplier rows, both with auto-numbered identifiers. How can all of these updates been made to the database?

The answer lies in the typed DataSet. The typed DataSet has a model of the foreign key constraints between the tables. If, after an insert to the database, the DataSet can be notified of the new identity value, then an algorithm can navigate the table relationships, propagating the changes to the auto-numbered value to references in other tables. This sounds more difficult than it actually is. Luckily, the ADO.NET DataSet provides facilities to make this a simple operation. The ADO.NET DataSet maintains the status of each cell of data (i.e., modified or not) and keeps the value of each field that is changed until the changes are accepted. Moreover, the DataSet allows event delegates to be attached that are called when each row is updated. Listing 1 shows how a routine can be written to make use of these facilities using the OnRowUpdated event handler of a SqlDataAdapter.

So the Update method of the proposed DataSetAdapter class can set up the event handlers to fix up the identity values, but there is still the problem of maintaining the referential integrity of the database during this process.

Referential Integrity
In order to understand the referential integrity problem, consider the Northwind database again. In the previous section, we imagined an application for configuring a product catalog. The application allowed a user to add, edit, and delete suppliers, product categories, and products. A SqlDataAdapter provides the interface between each of these tables in the DataSet and the corresponding table on the database. A SqlDataAdapter has an overloaded method called Update that will simply run through a supplied set of data rows invoking the insert, update, or delete SQL command appropriate for the state of the row.

If, in our typed DataSetAdapter design, we naively iterate over the tables in the DataSet invoking the Update method of the appropriate SqlDataAdapter, then all the updates will be executed a row at a time for each table in turn. Clearly, this will lead to referential integrity errors. For example, if a supplier and all its products have been deleted from the DataSet, then we had better make sure the Products SqlDataAdapter runs before the Suppliers SqlDataAdapter. On the other hand, if a new supplier has been added with a new set of products, then the Suppliers adapter needs to run before the Products adapter.

At first this seems to be problematic, but again the ADO.NET DataSet provides the building blocks of a solution.

As with the identity field problem, it is a simple matter to write a routine that uses the foreign key relationships between tables to work out the sequence of updates, deletes, and inserts that each adapter should apply to the database. This routine would be too long to reproduce in this article, but an example of how to write it is in the downloadable code, which can be found below.

Once the update sequence routine has been written, all the pieces of the typed DataSet data access architecture are in place. The DataSetAdapter base class (see Figure 5) implements the update sequence routine in its Update method. Moreover, this routine attaches event handlers to the appropriate SqlDataAdapter in order to propagate identity field changes (see Listing 1). This ensures that whenever an update or insert is made to the database, any references to auto-number fields in the updated row reflect the identity fields in the database.

Conclusion
ADO.NET provides a very well thought-out set of tools for managing typed relational data. It provides the building blocks for well-structured, scalable applications, along with rapid application development facilities.

However, without a solution to the two problems outlined in this article - the identity field problem and the referential integrity problem - the true potential of ADO.NET will not be realized. This article has shown that ADO.NET can be used as the basis of a layered data access architecture and has outlined how this can be implemented to solve both problems besetting ADO.NET.

So does ADO.NET meet its goals? Well, not straight out the box, but with a little help it works admirably.

About The Author
David Regan is a freelance systems programmer specializing in data- and rule-driven multitier systems. david.regan@criticalsystems.co.uk

	



Listing 1

private void PropagateRowChanges(object sender, SqlRowUpdatedEventArgs e)
{
  DataRow row = e.Row;
  if (row.RowState == DataRowState.Modified)
  {
    DataTable parentTable = row.Table;
    foreach (DataRelation relation in parentTable.ChildRelations)
    {
      DataTable childTable = dataSet.Tables[relation.ChildTable.TableName];
      for (int i=0; i< relation.ParentColumns.Length; i++)
      {
        DataColumn parentColumn = relation.ParentColumns[i];

        if (row[parentColumn].ToString() !=
		row[parentColumn,DataRowVersion.Original]
		.ToString())
        {
          DataColumn childColumn = relation.ChildColumns[i];
          string filter;
          if (childColumn.DataType == typeof(string) ||
              childColumn.DataType == typeof(DateTime))
          {
            filter = String.Format("{0}='{1}'",
                 childColumn.ColumnName,
		row[parentColumn,DataRowVersion.Original]);
          }
          else
          {
            filter = String.Format("{0}={1}",
                 childColumn.ColumnName,
		row[parentColumn,DataRowVersion.Original]);
          }
          DataRow[] propagateRows = childTable.Select(filter);
          foreach (DataRow childRow in propagateRows)
          {
            childRow[childColumn.ColumnName] = row[parentColumn];
          }
        }
      }
    }
  }
}

Additional Code: ~zip file 2.08 KB

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.

  E-mail: info@sys-con.com