Tolerating Concurrency Bugs Using Transactions as Lifeguards

Jie Yu,Satish Narayanasamy

doi:10.1109/micro.2010.56

Abstract

Parallel programming is hard, because it is impractical to test all possible thread interleavings. One promising approach to improve a multi-threaded program’s reliability is to constrain a production run’s thread interleavings in such a way that untested interleavings are avoided as much as possible. Such an approach would avoid hard-to-test rare thread interleavings in production runs, and thereby improve correctness. However, a key challenge in realizing this goal is in determining thread interleaving constraints from the tested correct interleavings, and enforcing them efficiently in production runs. In this paper, we propose a new method to determine thread interleaving constraints from the tested interleavings in the form of lifeguard transactions (LifeTxes). An untested code region initially is contained in a single LifeTx. As the code region is tested over more thread interleavings, its original LifeTx is automatically split into multiple smaller LifeTxes so that the newly tested interleavings are permitted in production runs. To efficiently enforce LifeTx constraints in production runs, we propose a hardware design similar to the eager conflict detection capability that exist in a conventional hardware transactional memory (TM) systems, but without the need for versioning, rollback and unbounded TM support.We show that 11 out of 14 real concurrency bugs in programs like Apache, MySQL and Mozilla could be avoided using the proposed approach for a negligible performance overhead.

Full Text