Our goal is to develop a framework to explain and formalize the essentials of recovery in a variety of scenarios. Our framework formalizes the notions of failures and persistence in the face of failures, dealing uniformly with the various means of achieving failure atomicity, both in the level of protocols (algorithms) and the level of mechanisms provided by the infrastructure (e.g., disks, remote nodes, etc.). A contribution of our formalism lies in intermediate abstractions that allow us to discourse about recovery at an interesting middle-level, more detailed than just failure atomicity and durability but more general than the specific policies of, say, log-flushing. It is at this level that it becomes meaningful to distinguish between different implementations of the same ideas/protocols; for example, we distinguish between an ARIES implementation for a centralized database and one for a Client-server architecture. We have also applied our ideas to mobile recovery and to the modeling of e-commerce activities, showing the breadth of the approach. By formalizing recovery we obtain insights on what are the essentials that repeat across solutions, allowing us to compare existing alternatives and design new ones. Finally, we gain the ability to consider alternative semantics of recovery, which take different application needs and different infrastructure services into account. We are currently studying how to apply our insights into recovery to non-traditional database applications, integrating them with other work done in the context of workflows. This recovery problem offers some challenges we set out to address with our research: composability of recovery properties and mechanisms, and alternative semantics for transactions and their recovery.
This year we expanded our study of recovery both in depth and breadth, while continuing with our three-pronged approach, studying: 1) recovery per se -- from first principles, 2) recovery for emerging transaction processing platforms, and 3) recovery for advanced applications. We refined our framework to better capture both the essentials of recovery and the broader class of properties that recovery underlies, namely the ability of systems to eventually complete their work in spite of failures, possibly via compensation and other actions which fall outside the strict framework of failure atomicity. Specifically, we codified assumptions of liveness in a system (i.e., that the system will keep trying to make forward progress) in a hypothesis on protocols, which states that certain sequences of actions will be attempted until they are completed (provided the appropriate preconditions hold). Adding the guarantees and protocols (see previous report), we are now able to prove both that a system does not reach incorrect states (safety) and that it eventually accomplishes its goal (liveness).
Recently we applied our framework to electronic commerce scenarios (a sample web auction scenario, and a web retail scenario), yielding intuitive yet detailed formalizations of the properties of electronic commerce with emphasis in their ability to complete (business) transactions correctly and in spite of failures. Specifically, through the application of our methodology (in the web retail example) we were able to prove the high-level, system-wide property of goods-money atomicity, i.e., that (the right amount of) money gets (ultimately, eventually) exchanged for (the correct) goods, without involving the internals of the parties to the transaction in the proof. This is particularly valuable in this context, as electronic commerce systems are in the increasingly important class of large systems composed of distributed and autonomous parts, which nonetheless are expected to exhibit failure resilience as a whole. Our methodology enables us to specify how each part contributes to recovery-related system-wide properties without requiring any subsystem to be aware of the (recovery-related) internals of other susbsystems.
This is possible because (for example) we model the behaviors of Merchant, Bank, Customer, etc., in terms of the guarantees they offer and the protocols they follow. The guarantees in particular are abstract embodiments of their promises of future behavior, which we assume hold when analyzing the properties of the top level. In turn, guarantees become requirements on each component, which we prove hold by analysis of each component's internals; for example, a Bank's guarantee to a Merchant that it will later pay an authorized charge is in turn supported by a Bank's protocols and guarantees that arise out of using database transactions. In turn the transactions' guarantees are supported by the protocols and guarantees of the underlying recovery infrastructure. Because our framework aptly describes the different levels with the same ingredients, it not only allows hiding of the details (as outlined above) but also shows how the higher level properties are obtained by composition from lower-level ones, and conversely shows the burden on the lower-level recovery imposed by the properties of the larger system. Moreover, we benefit from comparing patterns across levels, from low-level recovery to database transactions to business transactions, and applying insights gained in one level to other levels.
In addition to the above, we continued work on our new commit protocol that is more efficient (and yet complements) the standard two-phase commit protocol. This has obvious implications for recovery. We have also studied the concurrency control and correctness implications of transactions accessing broadcast data as well as data in multidatabase and real-time applications.
Our efforts have begun to provide a better understanding of (a) the ingredients of recovery, (b) high-level recovery requirements, (c) ways to achieve them, and (d) the tradeoffs in the choice of recovery protocols. With a good model of recovery, we are trying to better assess the efficacy of recovery methods. This should move recovery from difficult art to better-understood science, where crafting recovery is a still complex but fairly well understood and predictable activity. One indication of success will be the applicability of the building blocks identified so far to different transaction processing platforms on the one hand and to different applications on the other. This attempt to demonstrate applicability will also help us fine-tune the protocols, policies and mechanisms.
Our framework, which incorporates the novel formalization of the intuitive concept of recovery guarantees, protocols, and forced-progress (liveness) is a good answer to the need (which we identified in our proposal for this project) for an abstraction to better model, formalize, and reason about recovery. Albeit still in progress, our framework bridges the gap between low-level implementation details (concerning, e.g., creation and movement of log records) and the high-level requirements (e.g., Failure Atomicity, E-commerce goods-money atomicity) of recovery-based systems.
Four Ph.D. students, Lory Molesky, Mohan Kamath, Ming Xiong, and Cris Pedregal-Martin have been involved the current work or in work leading to this work. Part of their research assistantship support came from NSF. Lory Molesky (currently at Oracle Corp.) defended his Ph.D. thesis in summer '96; Mohan Kamath defended his Ph.D. thesis in Spring '98 (and joined Oracle Corp.); Ming Xiong defended his Ph.D. thesis in Spring '98 (currently at Lucent Bell Labs); Cris Pedregal-Martin expects to finish his dissertation this Summer.
Based on the perspective gained from our work on advanced (semantics-based) transaction processing for a number of years, we have developed a comprehensive text on the topic that was published by IEEE Press. Feedback from various institutions indicates that this book is a very useful tool for both graduate students and researchers.
The ACTA transaction framework and its linguistic counterpart ASSET (developed in collaboration with Bell Laboratories) were products of NSF support for our work. They provide building blocks and language primitives to construct advanced transaction models and have been serving as yardsticks for the providers of transaction modeling support. A substantial part of our work in the context of this project is a framework (outlined above) which extends the ACTA and ASSET frameworks into the realm of recovery and the recovery support for advanced transaction semantics in diverse contexts.