Significant performance advantages can be gained by implementing a database system on a cache-coherent shared memory multiprocessor. However, problems arise when failures occur. A single node (where a node refers to a processor/memory pair) crash may require a reboot of the entire shared memory system. Fortunately, shared memory multiprocessors that isolate individual node failures are currently being developed. Even with these, because of the side effects of the cache coherency protocol, a transaction executing strictly on a single node may become dependent on the validity of the memory of many nodes thereby inducing unnecessary transaction aborts. This happens when database objects, such as records, and database support structures, such as index structures and shared lock tables, are stored in shared memory.
In this work, we propose crash recovery protocols for shared memory database systems which avoid the unnecessary transaction aborts induced by the effects of using shared physical memory. Our recovery protocols guarantee that if one or more nodes crash, all the effects of active transactions running on the crashed nodes will be undone, and no effects of transactions running on nodes which did not crash will be undone. In order to show the practicality of our protocols, we discuss how existing features of cache-coherent multiprocessors can be utilized to implement these recovery protocols. Specifically, we demonstrate that (1) for many types of database objects and support structures, volatile (in-memory) logging is sufficient to avoid unnecessary transaction aborts, and (2) a very low overhead implementation of this strategy can be achieved with existing multiprocessor features.
Back to the Database Systems Home Page