Runtime checking of serializability in software transactional memory

Arnab Sinha,Sharad Malik

doi:10.1109/ipdps.2010.5470389

Abstract

Ensuring the correctness of complex implementations of software transactional memory (STM) is a daunting task. Attempts have been made to formally verify STMs, but these are limited in the scale of systems they can handle and generally verify only a model of the system, and not the actual system. In this paper we present an alternate attack on checking the correctness of an STM implementation by verifying the execution runs of an STM using a checker that runs in parallel with the transaction memory system. With future many-core systems predicted to have hundreds and even thousands of cores, it is reasonable to utilize some of these cores for ensuring the correctness of the rest of the system. This will be needed anyway given the increasing likelihood of dynamic errors due to particle hits (soft errors) and increasing fragility of nanoscale devices. These errors can only be detected at runtime. An important correctness criterion that is the subject of verification is the serializability of transactions. While checking transaction serializability is NP-complete, practically useful subclasses such as interchange-serializability (DSR) are efficiently computable. Checking DSR reduces to checking for cycles in a transaction ordering graph which captures the access order of objects shared between transaction instances. Doing this concurrent to the main transaction execution requires minimizing the overhead of capturing object accesses, and managing the size of the graph, which can be as large as the total number of dynamic transactions and object accesses. We discuss techniques for minimizing the overhead of access logging which includes time-stamping, and present techniques for on-the-fly graph compaction that drastically reduce the size of the graph that needs to be maintained, to be no larger than the number of threads. We have implemented concurrent serializability checking in the Rochester Software Transactional Memory (RSTM) system. We present our practical experiences with this including results for the RSTM, STAMP and synthetic benchmarks. The overhead of concurrent checking is a strong function of the transaction length. For long transactions this is negligible. Thus the use of the proposed method for continuous runtime checking is acceptable. For very short transactions this can be significant. In this case we see the applicability of the proposed method for debugging.

Full Text