Reliable distributed database systems (abstract only)

Sang Hyuk Son

doi:10.1145/322917.323032

Abstract

We are investigating the problem of ensuring global consistency in the context of distributed database systems. Our current research effort concentrates on theoretical study of reliability mechanisms such as algorithm design and performance characterization. In addition, we are building a testbed for evaluating different reliability mechanisms through detailed simulation and actual experimentation.Replication is the key factor in improving the availability of distributed database systems. A major restriction in using replication is that replicated copies must behave like a single copy. We have developed algorithms for replication control using tokens [2, 5]. The next step of our research in this direction would be to evaluate different partial operation policies which are critical in maintaining the correctness and achieving the high availability of distributed database systems. Two alternatives are possible when a partition occurs: pessimistic and optimistic. Neither of the two alternatives is superior to the other. Higher availability achieved by an optimistic approach may be penalized during recovery from partition failures, by backing out committed transactions which violate consistency constraints.Even if the replication and concurrency control mechanisms are correct and maintain the consistency of the database, the failures of hardware and/or software at the processing site and communication network may destroy the consistency of the database. In order to cope with failures, distributed database systems must provide recovery mechanisms. The goal of checkpointing is to save database states on a separate secure device so that the database can be recovered when errors and failures occur. A checkpointing mechanism which does not interfere with the transaction processing in distributed environment is highly desirable for many applications, where restricting transaction activity during checkpointing is not feasible. Our earlier research has resulted in the development of a non-intrusive checkpointing algorithm, along with associated recovery mechanisms [1, 3]. The desirable properties of non-interference and global consistency not only make the checkpointing and recovery more complicated in distributed database systems, but also increase the workload of the system. Currently, we are investigating the practicality of non-interfering checkpointing and fully decentralized checkpointing in distributed database systems [4, 6].

Full Text