Bridging the gap between
Transaction Management and
Mohan Kamath and Krithi Ramamritham
Department of Computer Science
University of Massachusetts
Amherst MA 01003
In the last decade, there has been growing interest in new database applications like office automation, CAD/CAM and software engineering. To handle these applications, database researchers have concentrated on developing advanced transaction models [Elm92]. However these models have not gained popularity due to the following reasons - the data-centric approach they use is still too strict for several applications, they have overlooked several practical issues and until recently they have had little implementation support [Moh94], [Alo96].
Adopting a process-centric approach, workflow management is emerging as a technique for modeling, executing and monitoring such applications. Although WFMS are being deployed to handle some real world tasks, they still have several limitations [Geo95] that must be addressed before WFMS can expand their scope of applicability and become more useful. WFMS use a process-centric approach since they primarily evolved from the early office-information and document-management systems. Hence they don't have adequate support to satisfy the modeling & correctness requirements [Ram96] of many intended applications. Some of the deficiencies include lack of support to keep track of data dependencies between different workflows, lack of support to control concurrent accesses to objects managed by non-transactional activities, lack of support for cooperative activities, and insufficient support for recovery. Emerging technologies like mobile computing also demand additional support from WFMS. Hence to improve the functionality of current WFMS in terms of flexibility and robustness, additional modeling primitives and enhanced system/execution support is needed. This is the main focus of our work.
Although the transaction management approach is too data-centric, it provides elegant schemes for handling concurrency control and recovery. Hence our approach to improve the modeling & execution support of WFMS makes use of features from traditional/advanced transaction management. This approach recognizes the complementary strengths and weaknesses of the data-centric and process-centric approaches and appreciates the need to close the gap between them. Such an approach is also used in [She93 , Bre93]. The ultimate goal of our work is to provide concrete guidelines for extending the features provided by current WFMS to handle the requirement of emerging applications and technologies. The rest of the paper is organized as follows: salient features offered by current workflow management systems are enumerated in section 2. Section 3 highlights the limitation of current WFMS and briefly outlines approaches to overcome these limitations. Section 4 concludes the paper.
Modeling support determines the extent of flexibility and precision available for designing workflows according to specified requirements and the facilities available for specifying how deviations from these requirements are to be handled. Current WFMS provide modeling primitives for defining the following: activities, entity responsible for executing an activity, data and control flow between activities, start and stop conditions for executing activities and the nesting of workflows. The entity responsible for executing an activity could be a program or a human role defined in a staff-definition database. Workflow definitions are stored in a task-definition database. Both these databases are usually managed by an underlying workflow database system and can be updated at any time. Activities and workflows can be defined to execute manually or automatically based on the occurrence of specific events.
Execution support determines how the system coordinates (schedules) workflows/activities at run-time based on their definitions and how system failures and logical failures are handled. Most WFMS use workflow schedulers based on finite-state automata or event-condition-action (ECA) interpreters. The workflow database maintains the state of activities and workflows in progress. When an activity is eligible for execution, if it is to be executed manually then the activity is added to the worklist of the role responsible for that activity. Later when the role player selects the activity from the worklist, it is started. Scheduling is usually performed by a workflow server which refers to the workflow database to determine the state of the various workflow instances in progress. The actual execution of an activity is supervised by application agents (workflow clients) running on the respective machines. The application agent interacts with the workflow server to fetch the workflow data required by the activity and to communicate back the outcome of the activity. As far as system and logical failures are concerned, only forward-recovery is provided by systems whenever possible. Any form of complex failure is typically handled by human intervention. Rudimentary concurrency control is provided by some systems in the form of check-in and check-out.
This section highlights the limitations of current WFMS. They have been classified into six different categories. In each category along with the limitations we also discuss possible approaches to handle these limitations.
Office-automation tasks are typical examples of coordinated workflows. The task-definition here is primarily static in the sense that the order of activity execution is fixed. Hence the modeling primitives provided by current WFMS are sufficient for this purpose. Although the individual activity or steps access the data in a consistent manner, interactions between concurrent workflows can cause inconsistencies and this is an important problem. Hence some activities can't be executed when certain other activities are being executed within other workflows and also one or more activities of a workflow might have to be executed as atomic units. Current WFMS do not provide primitives to specify or handle such restrictions.
Treating each workflow as an atomic unit of execution is very restrictive in terms of concurrency. Hence modeling primitives are needed to specify execution atomicity of a workflow at definition time. Other than specifying execution atomicity it is also necessary to specify the isolation requirements. Isolation requirements of a workflow essentially specify for each of its activities which activities from other workflows can't be executed concurrently. This can be achieved for example by specifying a compatibility matrix which is stored in the workflow database. At execution time, before scheduling an activity, the scheduler can refer to the workflow database to check for atomic units of execution and also verify that only compatible activities are executed concurrently. ConTracts [Wac92] uses an invariant based approach for performing concurrency control using criterion such as values of objects rather than read-write conflicts.
CAD/CAM, CASE and collaborative-authoring are good examples of cooperative workflows. WFMS support for cooperative activities are inadequate. Check-in and check-out form of concurrency control is not sufficient for cooperative applications. Though cooperative workflows also have a statically defined structure at the top-level, unlike coordinated workflows, the order of execution of some of the activities associated with the workflow is determined dynamically at execution time. Consider for example the design of a complex hierarchical object. Because of design and size integrity constraints, certain ordering is required between the design of some of the objects. Another example is one where changes made to one of the objects can affect other objects. When additions are made to an object or recent changes to the object are undone (intentional rollback), the designers of all the affected objects are to be notified (through worklists) to re-examine the design of such objects. Current WFMS do not have adequate modeling and execution support to satisfy these requirements.
To ensure that certain operations are done in a specific order, ordering constraints (dependencies) are to be specified for these operation and stored in the workflow database. At execution time, operations performed are logged. Before an operation is performed on an object, the state of the objects in the list is first checked (with the assistance of the log) to see if the required predecessor operations have already been performed. If they have not been performed then the requested operation has to wait. To ensure that changes that are added/undone to an object are correctly propagated to other dependent objects, for each object a dependent operations list is specified and stored in the workflow database. At execution time, after each operation, the dependent operations are invoked. If the operations are to be executed by a role, the operations are added to the worklist of the appropriate role. Using these techniques, concurrency control schemes in current WFMS can be enhanced to handle cooperative workflows.
There are several instances where non-transactional applications have to be integrated into workflows. In these applications data is not managed by a transactional resource manager and hence concurrent access to the same data can cause inconsistencies. Examples of these include word-processing documents, spreadsheets etc. Concurrent invocations of such applications on the same data is to be avoided to prevent inconsistencies. Current WFMS do not have provisions to handle this situation. This problem is different from the concurrent workflows interaction problem we discussed earlier in the context of coordinated workflows.
The following approaches can be used as solutions to the above problem. In the first approach all object accesses are to be made explicitly through the application agent, i.e., through suitable interfaces, the application makes object accesses transparent to the application agent. The application agent in turn can then relay this information to the workflow server which can coordinate accesses to objects by receiving similar information from other agents as well. An alternative approach can be used where IDs of the objects that are to be accessed by the applications can be obtained from the workflow data. At task definition time, the activities and the workflow data they read are marked as non-transactional. At execution time, the workflow server can then coordinate accesses to objects using this information. If nothing can be inferred about the objects accessed by the applications, a compatibility matrix based approach can be used which results in the serialization of non-transactional activities at execution time. Techniques developed in the DOM project [Man92] to handle conventional programs and data are also applicable here.
Mobile computer users are growing in number and integrating them into a WFMS demands additional support from the WFMS. Specific issues that arise here include how the WFMS coordinates tasks that are performed when mobile users work in a disconnected mode and when they cross wireless cell boundaries. Also location sensitive task/activity scheduling might have to be performed to use an organization's resources effectively. Current WFMS do not seem to have any provision to handle these special requirements.
A viable solution to coordinating the activities of mobile users in a disconnected mode has recently been proposed in [Alo95] where the required activities (applications and data) are downloaded onto the mobile computer before performing a planned disconnection. The activities are performed in a disconnected mode and the results are then uploaded after reconnection at which time exceptions that occurred during the disconnected mode of operation are also handled. When mobile users cross the borders of wireless cells, service handoff (mobile user's session migrating to a new information server) is to be performed from the old workflow server to a new workflow server if necessary. This decision will depend upon whether the mobile user is responsible for a small or large part of a workflow. In the latter case, it is preferable to migrate the workflow instance (workflow state data and other relevant information) to the new workflow server. If it is a small part (i.e, most of the remaining activities are to be performed by others who are in proximity to the old server), then the workflow instance need not be migrated. Consistency issues that arise when workflow instances migrate are to be handled carefully. For location sensitive routing (scheduling) of a mobile user's request, the modeling primitives should have provision to specify geographic information in the workflow definition.
Current WFMS do not have adequate provisions to handle failures as well. While some WFMS provide forward recovery with the aid of the underlying workflow database, all other types of failures are typically handled manually. Note that the workflow database is also susceptible to failures and techniques to handle such failures are discussed in [Kam96]. In the case of system failures, unless proper care is taken, inconsistencies can arise between the state of the workflow data as reflected in the workflow database and the actual state of the data in the remote database that the application accessed as part of the activity. This is especially true for legacy applications which do not expose the number of actions they invoked and the commit/abort status of each. All these can result in improper workflow executions. To ensure that the state/value of a data item as recorded in the workflow database is the same as the actual value of the data item in a remote resource manager from which the application accessed the data, the following is essential - (i) a two-phase commit protocol is to be used between the workflow database and the remote database and (ii) logging at the application agent to prevent loss of data from WFMS failures. Commit calls made by the applications have to be trapped using suitable routines. This is not an easy task in the case of legacy applications and needs further investigation. Also there are no modeling primitives to specify what is to be done in the event of logical failures. To handle logical failures, additional primitives are to be provided to specify the failure atomicity of the workflow (i.e, if all or some of the activities that have been executed earlier have to be compensated) and at execution time the workflow server must schedule the required compensating activities in the specified order.
Although current WFMS allow online modification of task/staff definition database, the semantics they offer seem to be insufficient. Specifically, if the definitions are changed when workflows are in progress, inconsistencies can arise. At task-definition modification time, options should be provided to specify if the definition is to take into effect immediately and if so execution support is needed to notify the appropriate role for any activity to be executed (new) or compensated as part of the new definition. During staff-definition modification, the system should check that activities which were already routed to the worklists of roles whose definitions have been modified are re-routed to some other role responsible for performing those activities. If this is not done then it is possible that those activities will be delayed or never be executed.
Due to the process-centric approach adopted by WFMS developers, current WFMS have several limitations. In this paper, we identified limitations with respect to supporting coordinated workflows, cooperative workflows, integrating non-transactional applications into workflows, workflows in mobile environments, handling failures, and dynamic definition modifications. Using concurrency control, logging and recovery concepts from transaction management we discussed approaches to overcome these limitations. Due to space restrictions, we only presented some of the salient aspects of our work. Additional information can be obtained from our workflow research web page.
This research has been supported by the National Science Foundation under grant IRI - 9314376 and a grant from Sun Microsystems Lab.