Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Formalizing and Verifying Transactional Memories

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Transactional memory (TM) has shown potential to simplify the task of writing concurrent programs. TM shifts the burden of managing concurrency from the programmer to the TM algorithm. The correctness of TM algorithms is generally proved manually. The goal of this thesis is to provide the mathematical and software tools to automatically verify TM algorithms under realistic memory models. Our first contribution is to develop a mathematical framework to capture the behavior of TM algorithms and the required correctness properties. We consider the safety property of opacity and the liveness properties of obstruction freedom and livelock freedom. We build a specification language of opacity. We build a framework to express hardware relaxed memory models. We develop a new high-level language, Relaxed Memory Language (RML), for expressing concurrent algorithms with a hardware-level atomicity of instructions, whose semantics is parametrized by various relaxed memory models. We express TM algorithms like TL2, DSTM, and McRT STM in our framework. The verification of TM algorithms is difficult because of the unbounded number, length, and delay of concurrent transactions and the unbounded size of the memory. The second contribution of the thesis is to identify structural properties of TM algorithms which allow us to reduce the unbounded verification problem to a language-inclusion check between two finite state systems. We show that common TM algorithms satisfy these structural properties. The third contribution of the thesis is our tool FOIL for model checking TM algorithms. FOIL takes as input the RML description of a TM algorithm and the description of a memory model. FOIL uses the operational semantics of RML to compute the language of the TM algorithm for two threads and two variables. FOIL then checks whether the language of the TM algorithm is included in the specification language of opacity. FOIL automatically determines the locations of fences, which if inserted, ensure the correctness of the TM algorithm under the given memory model. We use FOIL to verify DSTM, TL2, and McRT STM under the memory models of sequential consistency, total store order, partial store order, and relaxed memory order.

Similar Papers
  • Book Chapter
  • Cite Count Icon 31
  • 10.1007/978-3-642-02658-4_26
Software Transactional Memory on Relaxed Memory Models
  • Jan 1, 2009
  • Rachid Guerraoui + 2 more

Pseudo-code descriptions of STMs assume sequentially consistent program execution and atomicity of high-level STM operations like read, write, and commit. These assumptions are often violated in realistic settings, as STM implementations run on relaxed memory models, with the atomicity of operations as provided by the hardware. This paper presents the first approach to verify STMs under relaxed memory models with atomicity of 32 bit loads and stores, and read-modify-write operations. We present RML, a new high-level language for expressing concurrent algorithms with a hardware-level atomicity of instructions, and whose semantics is parametrized by various relaxed memory models. We then present our tool, FOIL, which takes as input the RML description of an STM algorithm and the description of a memory model, and automatically determines the locations of fences, which if inserted, ensure the correctness of the STM algorithm under the given memory model. We use FOIL to verify DSTM, TL2, and McRT STM under the memory models of sequential consistency, total store order, partial store order, and relaxed memory order.

  • Research Article
  • 10.1007/s10703-011-0131-3
Verification of STM on relaxed memory models
  • Nov 23, 2011
  • Formal Methods in System Design
  • Rachid Guerraoui + 2 more

Software transactional memories (STM) are described in the literature with assumptions of sequentially consistent program execution and atomicity of high level operations like read, write, and abort. However, in a realistic setting, processors use relaxed memory models to optimize hardware performance. Moreover, the atomicity of operations depends on the underlying hardware. This paper presents the first approach to verify STMs under relaxed memory models with atomicity of 32 bit loads and stores, and read-modify-write operations. We describe RML, a simple language for expressing concurrent programs. We develop a semantics of RML parametrized by a relaxed memory model. We then present our tool, FOIL, which takes as input the RML description of an STM algorithm restricted to two threads and two variables, and the description of a memory model, and automatically determines the locations of fences, which if inserted, ensure the correctness of the restricted STM algorithm under the given memory model. We use FOIL to verify DSTM, TL2, and McRT STM under the memory models of sequential consistency, total store order, partial store order, and relaxed memory order for two threads and two variables. Finally, we extend the verification results for DSTM and TL2 to an arbitrary number of threads and variables by manually proving that the structural properties of STMs are satisfied at the hardware level of atomicity under the considered relaxed memory models.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-662-53426-7_20
Opacity vs TMS2: Expectations and Reality
  • Jan 1, 2016
  • Sandeep Hans + 4 more

Most of the popular Transactional Memory (TM) algorithms are known to be safe because they satisfy opacity, the well-known correctness criterion for TM algorithms. Recently, it has been shown that they are even more conservative, and that they satisfy TMS2, a strictly stronger property than opacity. This paper investigates the theoretical and practical implications of relaxing those algorithms in order to allow histories that are not TMS2. In particular, we present four impossibility results on TM implementations that are not TMS2 and are either opaque or strictly serializable, and one practical TM implementation that extends TL2, a high-performance state-of-the-art TM algorithm, to allow non-TMS2 histories. By matching our theoretical findings with the results of our performance evaluation, we conclude that designing and implementing TM algorithms that are not TMS2, but safe, has inherent costs that limit any possible performance gain.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1007/s10703-024-00462-1
A verified durable transactional mutex lock for persistent x86-TSO
  • Jul 31, 2024
  • Formal Methods in System Design
  • Eleni Vafeiadi Bila + 1 more

The advent of non-volatile memory technologies has spurred intensive research interest in correctness and programmability. This paper addresses both by developing and verifying a durable (aka persistent) transactional memory (TM) algorithm, dTMLPx86. Correctness of dTMLPx86 is judged in terms of durable opacity, which ensures both failure atomicity (ensuring memory consistency after a crash) and opacity (ensuring thread safety). We assume a realistic execution model, Px86, which represents Intel’s persistent memory model and extends the Total Store Order memory model with instructions that control persistency. Our TM algorithm, dTMLPx86, is an adaptation of an existing software transactional mutex lock, but with additional synchronisation mechanisms to cope with Px86. Our correctness proof is operational and comprises two distinct types of proofs: (1) proofs of invariants of dTMLPx86 and (2) a proof of refinement against an operational specification that guarantees durable opacity. To achieve (1), we build on recent Owicki–Gries logics for Px86, and for (2) we use a simulation-based proof technique, which, as far as we are aware, is the first application of simulation-based proofs for Px86 programs. Our entire development has been mechanised in the Isabelle/HOL proof assistant.

  • Research Article
  • Cite Count Icon 12
  • 10.1145/3022671.2984025
Maximal causality reduction for TSO and PSO
  • Oct 19, 2016
  • ACM SIGPLAN Notices
  • Shiyou Huang + 1 more

Verifying concurrent programs is challenging due to the exponentially large thread interleaving space. The problem is exacerbated by relaxed memory models such as Total Store Order (TSO) and Partial Store Order (PSO) which further explode the interleaving space by reordering instructions. A recent advance, Maximal Causality Reduction (MCR), has shown great promise to improve verification effectiveness by maximally reducing redundant explorations. However, the original MCR only works for the Sequential Consistency (SC) memory model, but not for TSO and PSO. In this paper, we develop novel extensions to MCR by solving two key problems under TSO and PSO: 1) generating interleavings that can reach new states by encoding the operational semantics of TSO and PSO with first-order logical constraints and solving them with SMT solvers, and 2) enforcing TSO and PSO interleavings by developing novel replay algorithms that allow executions out of the program order. We show that our approach successfully enables MCR to effectively explore TSO and PSO interleavings. We have compared our approach with a recent Dynamic Partial Order Reduction (DPOR) algorithm for TSO and PSO and a SAT-based stateless model checking approach. Our results show that our approach is much more effective than the other approaches for both state-space exploration and bug finding – on average it explores 5-10X fewer executions and finds many bugs that the other tools cannot find.

  • Conference Article
  • Cite Count Icon 48
  • 10.1145/2983990.2984025
Maximal causality reduction for TSO and PSO
  • Oct 19, 2016
  • Shiyou Huang + 1 more

Verifying concurrent programs is challenging due to the exponentially large thread interleaving space. The problem is exacerbated by relaxed memory models such as Total Store Order (TSO) and Partial Store Order (PSO) which further explode the interleaving space by reordering instructions. A recent advance, Maximal Causality Reduction (MCR), has shown great promise to improve verification effectiveness by maximally reducing redundant explorations. However, the original MCR only works for the Sequential Consistency (SC) memory model, but not for TSO and PSO. In this paper, we develop novel extensions to MCR by solving two key problems under TSO and PSO: 1) generating interleavings that can reach new states by encoding the operational semantics of TSO and PSO with first-order logical constraints and solving them with SMT solvers, and 2) enforcing TSO and PSO interleavings by developing novel replay algorithms that allow executions out of the program order. We show that our approach successfully enables MCR to effectively explore TSO and PSO interleavings. We have compared our approach with a recent Dynamic Partial Order Reduction (DPOR) algorithm for TSO and PSO and a SAT-based stateless model checking approach. Our results show that our approach is much more effective than the other approaches for both state-space exploration and bug finding – on average it explores 5-10X fewer executions and finds many bugs that the other tools cannot find.

  • Book Chapter
  • Cite Count Icon 82
  • 10.1007/978-3-642-19835-9_3
Sound and Complete Monitoring of Sequential Consistency for Relaxed Memory Models
  • Jan 1, 2011
  • Jabob Burnim + 2 more

We present a technique for verifying that a program has no executions violating sequential consistency (SC) when run under the relaxed memory models Total Store Order (TSO) and Partial Store Order (PSO). The technique works by monitoring sequentially consistent executions of a program to detect if similar program executions could fail to be sequentially consistent under TSO or PSO.We propose novel monitoring algorithms that are sound and complete for TSO and PSO--if a program can exhibit an SC violation under TSO or PSO, then the corresponding monitor can detect this on some SC execution. The monitoring algorithms arise naturally from the operational definitions of these relaxed memory models, highlighting an advantage of viewing relaxed memory models operationally rather than axiomatically. We apply our technique to several concurrent data structures and synchronization primitives, detecting a number of violations of sequential consistency.

  • Research Article
  • Cite Count Icon 14
  • 10.1145/1594835.1504182
An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs
  • Feb 14, 2009
  • ACM SIGPLAN Notices
  • Seunghwa Kang + 1 more

Due to power wall, memory wall, and ILP wall, we are facing the end of ever increasing single-threaded performance. For this reason, multicore and manycore processors are arising as a new paradigm to pursue. However, to fully exploit all the cores in a chip, parallel programming is often required, and the complexity of parallel programming raises a significant concern. Data synchronization is a major source of this programming complexity, and Transactional Memory is proposed to reduce the difficulty caused by data synchronization requirements, while providing high scalability and low performance overhead. The previous literature on Transactional Memory mostly focuses on architectural designs. Its impact on algorithms and applications has not yet been studied thoroughly. In this paper, we investigate Transactional Memory from the algorithm designer's perspective. This paper presents an algorithmic model to assist in the design of efficient Transactional Memory algorithms and a novel Transactional Memory algorithm for computing a minimum spanning forest of sparse graphs. We emphasize multiple Transactional Memory related design issues in presenting our algorithm. We also provide experimental results on an existing software Transactional Memory system. Our algorithm demonstrates excellent scalability in the experiments, but at the same time, the experimental results reveal the clear limitation of software Transactional Memory due to its high performance overhead. Based on our experience, we highlight the necessity of efficient hardware support for Transactional Memory to realize the potential of the technology.

  • Conference Article
  • Cite Count Icon 33
  • 10.1145/1504176.1504182
An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs
  • Feb 14, 2009
  • Seunghwa Kang + 1 more

Due to power wall, memory wall, and ILP wall, we are facing the end of ever increasing single-threaded performance. For this reason, multicore and manycore processors are arising as a new paradigm to pursue. However, to fully exploit all the cores in a chip, parallel programming is often required, and the complexity of parallel programming raises a significant concern. Data synchronization is a major source of this programming complexity, and Transactional Memory is proposed to reduce the difficulty caused by data synchronization requirements, while providing high scalability and low performance overhead.The previous literature on Transactional Memory mostly focuses on architectural designs. Its impact on algorithms and applications has not yet been studied thoroughly. In this paper, we investigate Transactional Memory from the algorithm designer's perspective. This paper presents an algorithmic model to assist in the design of efficient Transactional Memory algorithms and a novel Transactional Memory algorithm for computing a minimum spanning forest of sparse graphs. We emphasize multiple Transactional Memory related design issues in presenting our algorithm. We also provide experimental results on an existing software Transactional Memory system. Our algorithm demonstrates excellent scalability in the experiments, but at the same time, the experimental results reveal the clear limitation of software Transactional Memory due to its high performance overhead. Based on our experience, we highlight the necessity of efficient hardware support for Transactional Memory to realize the potential of the technology.

  • Research Article
  • Cite Count Icon 83
  • 10.1016/0743-7315(92)90052-o
Programming for different memory consistency models
  • Aug 1, 1992
  • Journal of Parallel and Distributed Computing
  • Kourosh Gharachorloo + 4 more

Programming for different memory consistency models

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/cgo.2013.6495000
On the platform specificity of STM instrumentation mechanisms
  • Feb 1, 2013
  • Wenjia Ruan + 3 more

Supporting atomic blocks (e.g., Transactional Memory (TM)) can have far-reaching effects on language design and implementation. While much is known about the language-level semantics of TM and the performance of algorithms for implementing TM, little is known about how platform characteristics affect the manner in which a compiler should instrument code to achieve efficient transactional behavior. We explore the interaction between compiler instrumentation and the performance of transactions. Through evaluation on ARM/Android, SPARC/Solaris, IA32/Linux and IA32/MacOS, we show that the compiler must consider the platform when determining which analyses, transformations, and optimizations to perform. Implementation issues include how TM library code is reached, how per-thread TM metadata is stored and accessed, and how a library switches between modes of operation. We also show that different platforms favor different TM algorithms, through the introduction of a new TM algorithm for the ARM processor. Our findings will affect compiler and TM library designers: to achieve peak performance for transactions, the compiler must perform platform-dependent analysis, transformation, and optimization, and the interface to the TM library must differ according to platform.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/dsd.2012.27
Architecture Support and Comparison of Three Memory Consistency Models in NoC Based Systems
  • Jan 1, 2012
  • Abdul Naeem + 2 more

We propose a novel hardware support for three relaxed memory models, Release Consistency (RC), Partial Store Ordering (PSO) and Total Store Ordering (TSO) in Network-on-Chip (NoC) based distributed shared memory multicore systems. The RC model is realized by using a Transaction Counter and an Address Stack based approach to enforce the required global orders on the shared memory operations. The PSO and TSO models are realized by using a Write Transaction Counter and a Write Address Stack based approach to enforce the required global orders on the shared memory operations. In the experiments, we use a configurable platform based on a 2D mesh NoC using deflection routing policy. The results show that under synthetic workloads, the average execution time for the RC, PSO and TSO models in 8x8 network (64 cores) is reduced by 35.8%, 22.7% and 16.5% over the sequential consistency (SC) model, respectively. The average speedup for the RC, PSO and TSO models in 8x8 network under different application workloads is increased by 34.3%, 10.6% and 8.9% over the SC model, respectively. The area cost for the TSO, PSO and RC models is increased by less than 2% over the SC model at the interface to the processor.

  • Book Chapter
  • Cite Count Icon 41
  • 10.1007/978-3-642-36742-7_24
A Verification-Based Approach to Memory Fence Insertion in PSO Memory Systems
  • Jan 1, 2013
  • Alexander Linden + 1 more

This paper addresses the problem of verifying and correcting programs when they are moved from a sequential consistency execution environment to a relaxed memory context. Specifically, it considers the PSO (Partial Store Order) memory model, which corresponds to the use of a store buffer for each shared variable and each process. We also will consider, as an intermediate step, the TSO (Total Store Order) memory model, which corresponds to the use of one store buffer per process. The proposed approach extends a previously developed verification tool that uses finite automata to symbolically represent the possible contents of the store buffers. Its starting point is a program that is correct for the usual Sequential Consistency (SC) memory model, but that might be incorrect under PSO with respect to safety properties. This program is then first analyzed and corrected for the TSO memory model, and then this TSO-safe program is analyzed and corrected under PSO, producing a PSO-safe program. To obtain a TSO-safe program, only store-load fences (TSO only allows store-load relaxations) are introduced into the program. Finaly, to produce a PSO-safe program, only store-store fences (PSO additionally allows store-store relaxations) are introduced. An advantage of our technique is that the underlying symbolic verification tool makes a full exploration of program behaviors possible even for cyclic programs, which makes our approach broadly applicable. The method has been tested with an experimental implementation and can effectively handle a series of classical examples.

  • Research Article
  • Cite Count Icon 40
  • 10.1145/2086696.2086733
A transactional memory with automatic performance tuning
  • Jan 1, 2012
  • ACM Transactions on Architecture and Code Optimization
  • Qingping Wang + 3 more

A significant obstacle to the acceptance of transactional memory (TM) in real-world parallel programs is the abundance of substantially different TM algorithms. Each TM algorithm appears well-suited to certain workload characteristics, but the best choice of algorithm is sensitive to program inputs, available cores, and program phases. Furthermore, operating system and hardware characteristics can affect which algorithm is best, with tradeoffs changing across iterations of a single ISA. This paper introduces methods for constructing policies to dynamically select the most appropriate TM algorithm based on static and dynamic information. We leverage intraprocedural static analysis to create a static profile of the application. We also introduce a low-overhead framework for dynamic profiling of a running transactional application. Armed with these complementary descriptions of a program's behavior, we present novel expert adaptivity policies as well as machine learning policies that are trained off-line using simple microbenchmarks. In our evaluation, we find that both the expert and learned policies provide better performance than any single TM algorithm across the entire STAMP benchmark suite. In addition, policies that combine expert and learned policies offer the best combination of performance, maintainability, and flexibility.

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/2889160.2891042
Maximally stateless model checking for concurrent bugs under relaxed memory models
  • May 14, 2016
  • Alan Huang

Shared-memory multiprocessor architectures are now ubiq-uitous. To achieve higher performance, the constraints on the memory models become weaker. This makes it more challenging to verify concurrent programs. It is known that sequential consistency (SC) [7] is the most intuitive memory model. However, even for SC, it is challenging enough to verify the correctness of concurrent programs, because the number of the interleavings grows exponentially with the number of threads and the size of the program. For relaxed memory models, the verification problem is more tough because operations from the same thread can be re-ordered and there is even no globally consistent order among the operations of different threads. For example, for the Total Store Order (TSO) [9] and Partial Store Order (PSO) memory models, the order between a write and a following read or a write to different memory locations [2] can be re-ordered non-deterministically in the store buffer.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant