Failure Handling and Synchronization of Workflows in
Central, Parallel and Distributed Workflow Control Environments:

The CREW (Correct and Reliable Execution of Workflows) Project


Business processes often consist of applications that access shared resources like databases. Hence dependencies arise due to the interaction between steps of concurrent workflows. Events happening in one workflow can potentially cause data inconsistencies in other concurrent workflows. Also, due to the complicated nature of business processes, several threads can be executing concurrently within a workflow. Failure in one or more of the threads can create race conditions that lead to data inconsistencies. Since workflows handle crucial information about the daily business activities of enterprises, it is essential to prevent data inconsistencies. Thus, there is a great need to adequately handle failure and synchronization of workflows.

Since centralized workflow engines can become performance bottlenecks, parallel and distributed workflow environments need to be used for attaining scalability. In a parallel environment several engines operate in parallel, communicating with the same set of agents. In contrast, in a distributed environment, workflows are scheduled in a distributed manner by the agents themselves, thereby eliminating the bottleneck formed by centralized engines. In both these environments the state information about workflows in progress is distributed at different sites. Hence, the problem of handling failures and performing synchronization becomes even more complicated.

To address these problems, as part of the CREW Project at UMass, we have developed the necessary concepts and infrastructure. First, we have developed a workflow specification language called LAWS which provides high level building blocks for expressing mutual-exclusion and complex ordering requirements across workflow steps, and for achieving event-notification across workflow instances. Through these building blocks, LAWS allows the specification of failure handling and synchronization requirements -- apart from the traditional requirements like data flow and control flow. We use a rule-driven approach to execute the steps in a workflow. Hence, secondly, we have developed a compiler to translate all the high level requirements into a uniform set of rules. These rules can be dynamically modified, as steps execute (and failures occur). To this end, thirdly, we have identified a small set of primitives PostRule(), PostEvent() and PostPrecondition(), sufficient enough to realize the high level synchronization and failure handling requirements. These primitives are platform-independent in that they provide a unified mechanism to achieve specified correctness properties in centralized, parallel and distributed workflow control environments. Finally, to demonstrate the usefulness and practicality of our approach, we have implemented CREW run-time that realizes the requirements specified using LAWS in the three workflow environments.


Mohan U. Kamath and Krithi Ramamritham, "Failure Handling and Synchronization of Workflows in Parallel and Distributed Workflow Control Environments", (submitted for publication)

Mohan U. Kamath and Krithi Ramamritham, "Correctness Issues in Workflow Management", Distributed Systems Engineering (DSE) Journal : Special Issue on Workflow Management Systems, Volume 3, Number 4, December 1996.

Also check out the ongoing WorldFlow Project and other work we have done in Workflows.

Back to the Database Systems Home Page

If you have any comments on this page or need further information, please send your mail to
Last Update: 4 March 1997