Atomic transactions are a well-known technique for guaranteeing
consistency in the presence of failures. The ACID properties of
atomic transactions ensure that, even in complex business
applications, consistency of state is preserved.
Transactions are best viewed as "short-lived" entities operating in a
closely-coupled environment, performing stable state changes to the
system; they are less well suited for structuring "long-lived"
application functions (e.g., running for hours, days, etc.) and
running in a loosely coupled environment like the Web. Long-lived
atomic transactions (as typically occur in business-to-business
interactions) may reduce the concurrency in the system to an
unacceptable level by holding on to resources (e.g., locks) for a
long time; further, if such an atomic transaction rolls back, much
valuable work already performed could be undone. As a result, there
have been various extended transactions models where strict ACID
properties can be relaxed in a controlled manner. Until recently,
translating these models into the world of Web services had not been
attempted. However, the OASIS Business Transactions Protocol,
specified by a collaboration of several companies, has tried to
address this issue.
Introduction
With the advent of Web services, the Web is being populated by
service providers who wish to take advantage of this large B2B space.
However, there are still important security and fault-tolerance
considerations that must be addressed. One of these is the fact that
the Web frequently suffers from failures that can affect both the
performance and consistency of applications that run over it.
Atomic transactions are a well-known technique for guaranteeing
consistency in the presence of failures. (Note: I will not use the
term transaction in place of atomic transaction since in the B2B
space this has different connotations.) The ACID properties of atomic
transactions (Atomicity, Consistency, Isolation, Durability) ensure
that even in complex business applications consistency of state is
preserved, despite concurrent accesses and failures. This is an
extremely useful fault-tolerance technique, especially when multiple,
possibly remote, resources are involved.
The structuring mechanisms available within traditional atomic
transaction systems are sequential and concurrent composition of
transactions. These mechanisms are sufficient if an application
function can be represented as a single atomic transaction. As Web
services evolved as a means to integrate processes and applications
at an inter-enterprise level, traditional transaction semantics and
protocols have proven inappropriate. Web services-based transactions
differ from traditional transactions in that they execute over long
periods, they require commitments to the transaction to be
"negotiated" at runtime, and isolation levels have to be relaxed.
As a result, there have been various extended transactions models, in
which strict ACID properties can be relaxed in a controlled manner.
Until recently, translating these models into the world of Web
services had not been attempted. However, the OASIS Business
Transactions Protocol (BTP), specified by a collaboration of several
companies, has tried to address this issue. In this article we'll
first consider why traditional atomic transactions are insufficient
for long-running B2B activities,
and then describe how the BTP protocol has attempted to solve these
problems.
Why ACID Transactions Are Too Strong
ACID transactions by themselves are inadequate for structuring
long-lived applications. To ensure ACID-ity between multiple
participants, a multiphase (typically two) consensus mechanism is
required (see Figure 1). During the first (preparation) phase, an
individual participant must make durable any state changes that
occurred during the scope of the atomic transaction, such that these
changes can either be rolled back (undone) or committed later once
consensus to the transaction outcome has been determined among all
participants, i.e., any original state must not be lost at this
point, as the atomic transaction could still roll back. Assuming no
failures occurred during the first phase (in which case all
participants will be forced to undo their changes), in the second
(commitment) phase, participants may "overwrite" the original state
with the state made durable during the first phase.
In order to guarantee consensus, a two-phase commit is necessarily a
blocking protocol. After returning the phase 1 response, each
participant that returned a commit response must remain blocked until
it has received the coordinator's phase 2 message telling it what to
do. Until they receive this message, any resources used by the
participant are unavailable for use by other atomic transactions,
since to do so may result in non-ACID behavior. If the coordinator
fails before delivery of the second phase message these resources
remain blocked until it recovers. In addition, if a participant fails
after phase 1, but before the coordinator can deliver its final
commit decision, the atomic transaction cannot be completed until the
participant recovers: all participants must see both phases of the
commit protocol in order to guarantee ACID semantics. There is no
implied time limit between a coordinator sending the first phase
message of the commit protocol and it sending the second, commit
phase message; there could be seconds or hours between them.
Therefore, structuring certain activities from long-running atomic
transactions can reduce the amount of concurrency within an
application or (in the event of failures) require work to be
performed again. For example, there are certain classes of
application where it is known that resources acquired within an
atomic transaction can be released "early," rather than having to
wait until the atomic transaction terminates; in the event of the
atomic transaction rolling back, however, certain compensation
activities may be necessary to restore the system to a consistent
state. Such compensation activities (which may perform forward or
backward recovery) will typically be application specific, may not be
necessary at all, or may be more efficiently dealt with by the
application.
For example, long-running activities can be structured as many
independent, short-duration atomic transactions, to form a "logical"
long-running transaction. This structure allows an activity to
acquire and use resources for only the required duration of this
long-running activity. In Figure 2 an application activity (shown by
the dotted ellipse) has been split into many different, coordinated,
short-duration atomic transactions. Assume that the application
activity is concerned with booking a taxi (t1), reserving a table at
a restaurant (t2), reserving a seat at the theater (t3), booking a
room at a hotel (t4), and so on. If all of these operations were
performed as a single atomic transaction, then resources acquired
during t1 would not be released until the atomic transaction has
terminated. If subsequent activities t2, t3, etc., do not require
those resources, then they will be needlessly unavailable to other
clients.
However, if failures and concurrent access occur during the lifetime
of these individual transactional activities, then the behavior of
the entire "logical long-running transaction" may not possess ACID
properties. Therefore, some form of (application-specific)
compensation may be required to attempt to return the state of the
system to consistency. For example, let's assume that t4 aborts.
Further assume that the application can continue to make forward
progress, but in order to do so must now undo some state changes made
prior to the start of t4 (by t1, t2, or t3). New activities are
started; tc1 is a compensation activity that will attempt to undo
state changes performed by, say, t2 and t3, which will continue the
application once tc1 has completed. tc5' and tc6' are new activities
that continue after compensation, e.g. since it was not possible to
reserve the theater, restaurant, and hotel, it is decided to book
tickets at the cinema. Obviously, other forms of composition are
possible.
Properties of a Web Service-Based Transaction
The fundamental question addressed here is what properties must a
transaction model possess in order to support business-to-business
interactions? To begin to answer that, we need to understand what we
mean by a business transaction.
A business relationship is any distributed state maintained by two or
more parties and is subject to some contractual constraints
previously agreed to by those parties. A business transaction can
therefore be considered as a consistent change in the state of a
business relationship between parties. Each party in a business
transaction holds its own application state corresponding to the
business relationship with other parties in that transaction. During
the course of a business transaction, this state may change.
In the Web services domain, information about business transactions
is communicated in XML documents. However, how those documents are
exchanged by the different parties involved (e.g., e-mail or HTTP)
may be a function of the environment, type of business relationship,
or other business or logistical factors. Therefore, mandating a
specific XML carrier protocol may be too restrictive.
Since business relationships imply a level of value to the parties
associated by those relationships, achieving some level of consensus
among these parties is important. Not all participants within
a particular business transaction have to see the same outcome; a
specific transaction may possess multiple consensus groups.
In addition to understanding the outcomes, a participant within a
business transaction may need to support provisional or tentative
state changes during the course of the transaction. Such parties must
also support the completion of a business transaction, either through
confirmation (final effect) or cancellation (counter-effect). In
general, what it means to confirm or cancel work done within a
business transaction will be for the participant to determine.
For example, an application may choose to perform changes as
provisional effects and make them visible to other business
transactions. It may store necessary information to undo these
changes at the same time. On confirmation, it may simply discard
these "undo", changes, or on cancellation it may apply these "undo"
changes. An application can employ such a compensation-based approach
or take a conventional "rollback" approach. It is with these
properties in mind that we can discuss the Business Transaction
Protocol.
The Business Transaction Protocol
B2B interactions may be complex, involving many parties, spanning
many different organisations, and potentially lasting for hours or
days, e.g., the process of ordering and delivering parts for a
computer may involve different suppliers, and may only be considered
to have completed once the parts are delivered to their final
destination. Most business-to-business collaborative applications
require transactional support in order to guarantee consistent
outcome and correct execution. These applications often involve
long-running computations, loosely coupled systems, and components
that do not share data, location, or administration; it is then
difficult to incorporate ACID transactions within such architectures.
Furthermore, most collaborative business process management systems
support complex, long-running processes in which undoing tasks that
have already completed may be necessary in order to effect recovery
or to choose another acceptable execution path.
For example, an online bookshop may well reserve books for an
individual for a specific period of time, but if the individual does
not purchase the books within that time period they will be "put back
onto the shelf" for others to purchase; to do otherwise could result
in the shop never selling a single book. Furthermore, because it is
not possible for anyone to have an infinite supply of stock, some
examples of online shops may appear to users to reserve items for
them, but in fact if other users want to purchase them first they may
be allowed to (i.e., the same book may be "reserved" for multiple
users concurrently); a user may subsequently find that the item is no
longer available, or may have to be ordered especially for them. If
these examples were modelled using atomic transactions, then the
reservation process would require the book to be locked for the
duration of the atomic transaction - it would have to be available,
and could not be acquired by (sold to) another user. When the atomic
transaction commits, the book will be removed from stock and mailed
to the user. However, if a failure occurs during the commitment
protocol, the book may remain locked for an indeterminate amount of
time (or until manual intervention occurs).
As a result, the use of traditional atomic transactions with strict
ACID properties (e.g., systems that implement the JTS specification
[SUN99]) is considered too restrictive for many types of applications.
The Organization for the Advancement of Structured Information
Standards (OASIS) Business Transaction Protocol (BTP) is a
transaction protocol that meets the requirement for Web-based,
long-running collaborative business applications. BTP is designed to
support applications that are disparate in time, location, and
administration and thus require transactional support beyond
classical ACID transactions. In short, BTP is a protocol for ensuring
consistent outcomes from participating parties in a business
transaction.
Note: It is important to realize that the term "transaction" in this
sense does not mean atomic transaction, although ACID semantics can
be obtained if required.
Consensus of Opinion
In general, a business transaction requires the capability for
certain participants to be structured into a consensus group such
that all of the members in a grouping have the same result. Different
participants within the same business transaction may belong to
different consensus groups. The business logic then controls how each
group completes. In this way, a business transaction may cause a
subset of the groups it naturally creates to perform the work it
asks, while asking the other groups to undo the work.
Consider the situation shown in Figure 4, in which a user is booking
a holiday, has provisionally reserved a flight ticket and taxi to the
airport, and is now looking for travel insurance. The first consensus
group holds Flights and Taxi, since neither of these can occur
independently. The user may then decide to visit multiple insurance
sites (called A and B in this example), and as he goes may reserve
the quotes he likes. So, A may quote $50, which is just within
budget, but the user may want to try B just in case he can find a
cheaper price, without losing the initial quote. If the quote from B
is less than that from A, the user may cancel A while confirming both
the flights and the insurance from B. Each insurance site may
therefore occur within its own consensus group. This is not possible
when using ACID transactions.
BTP uses a two-phase completion protocol to guarantee atomicity of
decisions but does not imply specific implementations. To enforce
this distinction, rather than call the second phases of the
termination protocol "commit" and "rollback" as is the case in an
ACID transaction environment, they are called "confirm" and "cancel"
respectively, with the intention of decoupling the phases from any
preconceptions of specific backward-compensation implementations.
It's important to stress that although BTP uses a two-phase protocol,
it does not imply ACID transactions. How implementations of prepare,
confirm, and cancel are provided is a back-end implementation
decision. Issues to do with consistency and isolation of data are
also back-end choices and not imposed or assumed by BTP. A BTP
implementation is primarily concerned with two-phase coordination of
abstract entities (participants).
Open-Top Coordination
In a traditional transaction system, the application or user has very
few verbs with which to control the transaction. Typically, these are
"begin," "commit," and "roll back," corresponding to starting a
transaction, committing a transaction, and rolling back a transaction
respectively. When an application asks for a transaction to commit,
the coordinator will execute the entire two-phase commit protocol, as
described earlier, before returning an outcome to the application
(what BTP terms a closed-top commit protocol). The elapse time
between the execution of the first phase and the second phase is
typically milliseconds to seconds, but is entirely under the control
of the coordinator.
However, the actual two-phase protocol does not impose any
restrictions on the time between executing the first and second
phases. Obviously, the longer this period takes, the more chance
there is for a failure to occur and the longer (critical) resources
remain locked or isolated from other users. This is the reason why
most ACID transaction systems attempt to keep this time frame to a
minimum and why they do not work well in environments like the Web.
BTP, on the other hand, took the approach of allowing the time
between these phases to be set by the application by expanding the
verbs available to include explicit control over both phases of the
term, i.e., "prepare," "confirm," and "cancel" - what BTP terms an
open-top commit protocol. The application has complete control over
when it can tell a transaction to prepare and, using whatever
business logic is required, it can later determine which
transaction(s) to confirm or cancel. This ability is a powerful tool
for applications.
Atoms and Cohesions
To address the specific requirements of business transactions, BTP
introduced two types of extended transactions, both using the
open-top completion protocol:
Atom: An atom is the typical way in which "transactional" work performed on Web services is scoped. The outcome of an atom is
guaranteed to be consistent such that all enlisted participants will
see the same outcome, which will be either to accept (confirm) the
work or reject (cancel) it.
Cohesion: This type of transaction was introduced in order to relax atomicity and allow for the selection of work to be confirmed
or cancelled based on higher-level business rules. Atoms are the
typical participants within a cohesion, but unlike an atom, a
cohesion may give different outcomes to its participants such that
some of them may confirm while the remainder cancel. In essence, the
two-phase protocol for a cohesion is parameterized to allow a user to
specify precisely which participants to prepare and which to cancel.
The strategy underpinning cohesions is that they better model
long-running business activities in which services enroll in atoms
that represent specific units of work and as the business activity
progresses, may encounter conditions that allow it to cancel or
prepare these units, with the caveat that it may be many hours or
days before the cohesion arrives at its confirm-set: the set of
participants that it requires to confirm in order to successfully
terminate the business activity. Once the confirm-set has been
determined, the cohesion collapses down to being an atom: all members
of the confirm-set see the same outcome.
Looking Ahead
In my next article, I'll take a closer look at the architecture of
BTP and how XML is involved in it. I'll also look at the Web services
stack and how BTP is used.
References
BTP: www.oasis-open.org/committees/business-transactions
OMG (1995) "CORBAservices: Common Object Services
Specification." OMG Document Number 95-3-31. March.
Sun Microsystems Inc. (1999) "Java Transaction API 1.0.1 (JTA)," April.
Sun Microsystems Inc. (2002) "XML Transactioning API for Java
(JAXTX)."
www.jcp.org/jsr/detail/156.jsp.
Author Bio
Mark Little is a distinguished engineer/architect within HP Arjuna
Labs, where he leads the HP-TS and HP-WST teams. He is one of the
primary authors of the OMG Activity Service specification, is on the
expert group for JSR 95, and leads the JSR 156 activity on an XML API
for Java Transactions. He is HP's representative on the OTS Revision
Task Force, and the OASIS Business Transactions Protocol
specification.
mark_little@hp.com
All Rights Reserved
Copyright © 2004 SYS-CON Media, Inc.
E-mail:
info@sys-con.com