Formal Verification of Lock-Free Algorithms

  • TL;DR
  • Abstract
  • Literature Map
  • Similar Papers
TL;DR

This paper addresses the complexity of verifying lock-free algorithms that rely on atomic compare-and-set instructions by demonstrating their correctness through linearizability and lock-freeness criteria. It proposes a modular verification approach using rely-guarantee reasoning and temporal logic, verified with the theorem prover KIV, and discusses related methods for automating proofs.

Abstract
Translate article icon Translate Article Star icon

The current trend towards multi-core processors has renewed the interest in the development and correctness of concurrent algorithms. Most of these algorithms rely on locks to protect critical sections from unwanted interference. Recently a new class of nonblocking algorithms has been developed which do not rely on critical sections, but on atomic compare-and-set instructions. Such lock-free algorithms are less vulnerable to the typical problems of concurrent algorithms: deadlocks, livelocks and priority inversion. On the other hand, the lack of a uniform principle to rule out interference results in increased complexity. This makes it harder to understand these algorithms and to verify their correctness. The paper gives a simple example to demonstrate the central correctness criteria of linearizability (a safety property) and lock-freeness (a liveness property) for lock-free algorithms. It then sketches our approach to the modular verification of lock-free algorithms which uses rely-guarantee reasoning and a powerful temporal logic to derive refinement proof obligations that can be verified with the interactive theorem prover KIV. Finally an overview over related work and techniques that are relevant to automate proofs is given.

Similar Papers
  • Supplementary Content
  • Cite Count Icon 3
  • 10.26174/thesis.lboro.8138030.v1
Specification and verification of network algorithms using temporal logic
  • May 28, 2019
  • Ra'Ed Bani-Abdelrahman

Specification and verification of network algorithms using temporal logic

  • Book Chapter
  • Cite Count Icon 13
  • 10.1007/978-3-540-30474-6_54
Lock-Free Parallel Algorithms: An Experimental Study
  • Jan 1, 2004
  • Guojing Cong + 1 more

Lock-free shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lock-free data structures include increasing fault tolerance of a (possibly heterogeneous) system and alleviating the problems associated with critical sections such as priority inversion and deadlock. For parallel computers with tightly-coupled processors and shared memory, these issues are no longer major concerns. While many of the results are applicable especially when the model used is shared memory multiprocessors, no prior studies have considered improving the performance of a parallel implementation by way of lock-free programming. As a matter of fact, often times in practice lock-free data structures in a distributed setting do not perform as well as those that use locks. As the data structures and algorithms for parallel computing are often drastically different from those in distributed computing, it is possible that lock-free programs perform better. In this paper we compare the similarity and difference of lock-free programming in both distributed and parallel computing environments and explore the possibility of adapting lock-free programming to parallel computing to improve performance. Lock-free programming also provides a new way of simulating PRAM and asynchronous PRAM algorithms on current parallel machines.

  • Dissertation
  • Cite Count Icon 2
  • 10.26686/wgtn.16969549
The Design and Verification of Dynamic-sized Nonblocking Data Structures
  • Jan 1, 2010
  • Simon Doherty

<p>Modern computer systems often involve multiple processes or threads of control that communicate through shared memory. However, the implementation of correct and efficient data structures that can be shared by several processes is frequently challenging. This thesis is concerned with the design and verification of a class of shared memory algorithms known as nonblocking algorithms, which are implementations of shared data structures that provide strong progress guarantees. Nonblocking algorithms offer an appealing alternative to traditional techniques for the implementation of shared memory data structures, but they are difficult to design, and extant algorithms can often be applied in only a limited range of systems. Furthermore, because of their subtlety, it is notoriously difficult to determine whether a given nonblocking algorithm is correct. This thesis addresses these difficulties in two ways. First, we present techniques for the verification of nonblocking algorithms that dynamically allocate memory. These techniques allow the construction of formal and complete proofs of correctness, so that each proof may be checked by a mechanical proof assistant. Applying techniques first developed for the verification of distributed algorithms, we use labelled-transition systems to model algorithms and their specifications, and simulation relations to prove that an implementation meets its specification. Nonblocking algorithms often require a particular notion of simulation, called backward simulation, that is rarely necessary in other contexts. This thesis contributes to the relatively limited collective experience in the use of backward simulation. The second set of contributions addresses the limitations of many extant nonblocking algorithms. While many nonblocking algorithms allocate memory dynamically, it is difficult to determine in a nonblocking context when it is safe to free memory. We present techniques to accomplish this. Furthermore, many nonblocking algorithms depend on the availability of two powerful synchronisation primitives, known as load-linked and store-conditional, which are not normally provided by hardware. We present implementations of these primitives that work on commonly available platforms.</p>

  • Dissertation
  • 10.26686/wgtn.16969549.v1
The Design and Verification of Dynamic-sized Nonblocking Data Structures
  • Jan 1, 2010
  • Simon Doherty

<p>Modern computer systems often involve multiple processes or threads of control that communicate through shared memory. However, the implementation of correct and efficient data structures that can be shared by several processes is frequently challenging. This thesis is concerned with the design and verification of a class of shared memory algorithms known as nonblocking algorithms, which are implementations of shared data structures that provide strong progress guarantees. Nonblocking algorithms offer an appealing alternative to traditional techniques for the implementation of shared memory data structures, but they are difficult to design, and extant algorithms can often be applied in only a limited range of systems. Furthermore, because of their subtlety, it is notoriously difficult to determine whether a given nonblocking algorithm is correct. This thesis addresses these difficulties in two ways. First, we present techniques for the verification of nonblocking algorithms that dynamically allocate memory. These techniques allow the construction of formal and complete proofs of correctness, so that each proof may be checked by a mechanical proof assistant. Applying techniques first developed for the verification of distributed algorithms, we use labelled-transition systems to model algorithms and their specifications, and simulation relations to prove that an implementation meets its specification. Nonblocking algorithms often require a particular notion of simulation, called backward simulation, that is rarely necessary in other contexts. This thesis contributes to the relatively limited collective experience in the use of backward simulation. The second set of contributions addresses the limitations of many extant nonblocking algorithms. While many nonblocking algorithms allocate memory dynamically, it is difficult to determine in a nonblocking context when it is safe to free memory. We present techniques to accomplish this. Furthermore, many nonblocking algorithms depend on the availability of two powerful synchronisation primitives, known as load-linked and store-conditional, which are not normally provided by hardware. We present implementations of these primitives that work on commonly available platforms.</p>

  • Research Article
  • Cite Count Icon 2
  • 10.1109/access.2019.2920781
Accelerating Wait-Free Algorithms: Pragmatic Solutions on Cache-Coherent Multicore Architectures
  • Jan 1, 2019
  • IEEE Access
  • Junchang Wang + 4 more

Parallelizing performance-critical applications are of critical importance for harnessing the power of multicore processors, which are now ubiquitous. Even though wait-free algorithms offer the appeal of completing each operation of a parallelized application in a finite number of steps, high-performance wait-free algorithms at high levels of concurrency are still rare. In this paper, we demonstrate one primary reason for this inefficiency: existing wait-free algorithms are not optimized for processors’ caches and write buffers, two key components in modern hardware to accelerate memory accesses. As an example, a wait-free multi-producer-single-consumer queue algorithm, which faces common performance problems of wait-free algorithms, is studied in this paper. We accelerate the queue algorithm by (1) allowing producers to buffer enqueue requests in their local buffers and to write them into the shared queue in batch, to exhibit the spatial locality of the program, and (2) eliminating expensive atomic operations, by giving up some degree of internal consistency to avoid write buffers being drained frequently. The outcome is a write-buffer and cache-friendly queue algorithm which is wait-free and efficient on off-the-shelf multicore processors. The experiments show that the optimized queue algorithm outperforms prior queue algorithms on three different architectures ( $\times $ 86, Power8, and ARMv8). On $\times $ 86, it outperforms WFQueue, the state-of-the-art solution, by 2–3 $\times$ , and outperforms CCQueue, the representative of combining solution, by 4–12 $\times $ . Applying the techniques presented in this paper to other wait-free algorithms is straightforward; our queue example demonstrates that these techniques can be applied to other wait-free algorithms while maintaining the control flow of the original algorithms without dramatic changes.

  • Dissertation
  • 10.26686/wgtn.17004076
On the Use of Model Checking for the Bounded and Unbounded Verification of Nonblocking Concurrent Data Structures
  • Jan 1, 2013
  • David Friggens

<p>Concurrent data structure algorithms have traditionally been designed using locks to regulate the behaviour of interacting threads, thus restricting access to parts of the shared memory to only one thread at a time. Since locks can lead to issues of performance and scalability, there has been interest in designing so-called nonblocking algorithms that do not use locks. However, designing and reasoning about concurrent systems is difficult, and is even more so for nonblocking systems, as evidenced by the number of incorrect algorithms in the literature. This thesis explores how the technique of model checking can aid the testing and verification of nonblocking data structure algorithms. Model checking is an automated verification method for finite state systems, and is able to produce counterexamples when verification fails. For verification, concurrent data structures are considered to be infinite state systems, as there is no bound on the number of interacting threads, the number of elements in the data structure, nor the number of possible distinct data values. Thus, in order to analyse concurrent data structures with model checking, we must either place finite bounds upon them, or employ an abstraction technique that will construct a finite system with the same properties. First, we discuss how nonblocking data structures can be best represented for model checking, and how to specify the properties we are interested in verifying. These properties are the safety property linearisability, and the progress properties wait-freedom, lock-freedom and obstructionfreedom. Second, we investigate using model checking for exhaustive testing, by verifying bounded (and hence finite state) instances of nonblocking data structures, parameterised by the number of threads, the number of distinct data values, and the size of storage memory (e.g. array length, or maximum number of linked list nodes). It is widely held, based on anecdotal evidence, that most bugs occur in small instances. We investigate the smallest bounds needed to falsify a number of incorrect algorithms, which supports this hypothesis. We also investigate verifying a number of correct algorithms for a range of bounds. If an algorithm can be verified for bounds significantly higher than the minimum bounds needed for falsification, then we argue it provides a high degree of confidence in the general correctness of the algorithm. However, with the available hardware we were not able to verify any of the algorithms to high enough bounds to claim such confidence. Third, we investigate using model checking to verify nonblocking data structures by employing the technique of canonical abstraction to construct finite state representations of the unbounded algorithms. Canonical abstraction represents abstract states as 3-valued logical structures, and allows the initial coarse abstraction to be refined as necessary by adding derived predicates. We introduce several novel derived predicates and show how these allow linearisability to be verified for linked list based nonblocking stack and queue algorithms. This is achieved within the standard canonical abstraction framework, in contrast to recent approaches that have added extra abstraction techniques on top to achieve the same goal. The finite state systems we construct using canonical abstraction are still relatively large, being exponential in the number of distinct abstract thread objects. We present an alternative application of canonical abstraction, which more coarsely collapses all threads in a state to be represented by a single abstract thread object. In addition, we define further novel derived predicates, and show that these allow linearisability to be verified for the same stack and queue algorithms far more efficiently.</p>

  • Dissertation
  • 10.26686/wgtn.17004076.v1
On the Use of Model Checking for the Bounded and Unbounded Verification of Nonblocking Concurrent Data Structures
  • Jan 1, 2013
  • David Friggens

<p>Concurrent data structure algorithms have traditionally been designed using locks to regulate the behaviour of interacting threads, thus restricting access to parts of the shared memory to only one thread at a time. Since locks can lead to issues of performance and scalability, there has been interest in designing so-called nonblocking algorithms that do not use locks. However, designing and reasoning about concurrent systems is difficult, and is even more so for nonblocking systems, as evidenced by the number of incorrect algorithms in the literature. This thesis explores how the technique of model checking can aid the testing and verification of nonblocking data structure algorithms. Model checking is an automated verification method for finite state systems, and is able to produce counterexamples when verification fails. For verification, concurrent data structures are considered to be infinite state systems, as there is no bound on the number of interacting threads, the number of elements in the data structure, nor the number of possible distinct data values. Thus, in order to analyse concurrent data structures with model checking, we must either place finite bounds upon them, or employ an abstraction technique that will construct a finite system with the same properties. First, we discuss how nonblocking data structures can be best represented for model checking, and how to specify the properties we are interested in verifying. These properties are the safety property linearisability, and the progress properties wait-freedom, lock-freedom and obstructionfreedom. Second, we investigate using model checking for exhaustive testing, by verifying bounded (and hence finite state) instances of nonblocking data structures, parameterised by the number of threads, the number of distinct data values, and the size of storage memory (e.g. array length, or maximum number of linked list nodes). It is widely held, based on anecdotal evidence, that most bugs occur in small instances. We investigate the smallest bounds needed to falsify a number of incorrect algorithms, which supports this hypothesis. We also investigate verifying a number of correct algorithms for a range of bounds. If an algorithm can be verified for bounds significantly higher than the minimum bounds needed for falsification, then we argue it provides a high degree of confidence in the general correctness of the algorithm. However, with the available hardware we were not able to verify any of the algorithms to high enough bounds to claim such confidence. Third, we investigate using model checking to verify nonblocking data structures by employing the technique of canonical abstraction to construct finite state representations of the unbounded algorithms. Canonical abstraction represents abstract states as 3-valued logical structures, and allows the initial coarse abstraction to be refined as necessary by adding derived predicates. We introduce several novel derived predicates and show how these allow linearisability to be verified for linked list based nonblocking stack and queue algorithms. This is achieved within the standard canonical abstraction framework, in contrast to recent approaches that have added extra abstraction techniques on top to achieve the same goal. The finite state systems we construct using canonical abstraction are still relatively large, being exponential in the number of distinct abstract thread objects. We present an alternative application of canonical abstraction, which more coarsely collapses all threads in a state to be represented by a single abstract thread object. In addition, we define further novel derived predicates, and show that these allow linearisability to be verified for the same stack and queue algorithms far more efficiently.</p>

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/cec.2016.7744316
Automatic lock-free parallel programming on multi-core processors
  • Jul 1, 2016
  • Gopinath Chennupati + 2 more

Writing correct and efficient parallel programs is an unavoidable challenge; the challenge becomes arduous with lock-free programming. This paper presents an automated approach, Automatic Lock-free Programming (ALP) that avoids the programming difficulties via locks for an average programmer. ALP synthesizes parallel lock-free recursive programs that are directly compilable on multi-core processors. ALP attains the dual objective of evolving parallel lock-free programs and optimizing their performance. These programs perform (in terms of execution time) significantly better than that of the parallel programs with locks, while they are competitive with that of the human developed programs.

  • Conference Article
  • Cite Count Icon 20
  • 10.1145/3361525.3361549
Boosting concurrency in Parallel State Machine Replication
  • Dec 9, 2019
  • Ian Aragon Escobar + 3 more

State machine replication (SMR) is a well-known approach to implementing fault-tolerant services, providing high availability and strong consistency. To boost the performance of SMR, some proposals execute independent commands concurrently, while dependent commands execute sequentially in the total delivery order. The most general approach to handling command dependencies resorts to a directed acyclic graph (DAG), where nodes represent commands and edges represent dependencies. In this paper we show that due to the command arrival and multithreaded execution rates of SMR, a highly concurrent implementation of a DAG is needed. We show that a typical coarse-grained DAG implementation, where the whole graph is a critical section, results in a bottleneck in the replica. We propose two improvements to the coarse-grained DAG approach: fine-grained algorithms, using lock-coupling, and lock-free algorithms. Our fine-grain algorithms lock individual vertices in the DAG. The lock-free algorithms use nonblocking synchronization, with atomic operations, and lazy synchronization to postpone physical removal of nodes. All algorithms were integrated in a parallel SMR prototype. Experimental evaluation revealed that the fine-grained algorithms are also subject to a bottleneck. The lock-free implementation, however, sports linear speedup with the number of working threads, in some cases scaling up to 64 threads.

  • Conference Article
  • Cite Count Icon 4
  • 10.1145/3558481.3591089
Releasing Memory with Optimistic Access: A Hybrid Approach to Memory Reclamation and Allocation in Lock-Free Programs
  • Jun 17, 2023
  • Pedro Moreno + 1 more

Lock-free data structures are an important tool for the development of concurrent programs as they provide scalability, low latency and avoid deadlocks, livelocks and priority inversion. However, they require some sort of additional support to guarantee memory reclamation. The Optimistic Access (OA) method has most of the desired properties for memory reclamation, but since it allows memory to be accessed after being reclaimed, it is incompatible with the traditional memory management model. This renders it unable to release memory to the memory allocator/operating system, and, as such, it requires a complex memory recycling mechanism. In this paper, we extend the lock-free general purpose memory allocator LRMalloc to support the OA method. By doing so, we are able to simplify the memory reclamation method implementation and also allow memory to be reused by other parts of the same process. We further exploit the virtual memory system provided by the operating system and hardware in order to make it possible to release reclaimed memory to the operating system.

  • Research Article
  • Cite Count Icon 290
  • 10.1145/1233307.1233309
Concurrent programming without locks
  • May 1, 2007
  • ACM Transactions on Computer Systems
  • Keir Fraser + 1 more

Mutual exclusion locks remain the de facto mechanism for concurrency control on shared-memory data structures. However, their apparent simplicity is deceptive: It is hard to design scalable locking strategies because locks can harbor problems such as priority inversion, deadlock, and convoying. Furthermore, scalable lock-based systems are not readily composable when building compound operations. In looking for solutions to these problems, interest has developed in nonblocking systems which have promised scalability and robustness by eschewing mutual exclusion while still ensuring safety. However, existing techniques for building nonblocking systems are rarely suitable for practical use, imposing substantial storage overheads, serializing nonconflicting operations, or requiring instructions not readily available on today's CPUs. In this article we present three APIs which make it easier to develop nonblocking implementations of arbitrary data structures. The first API is a multiword compare-and-swap operation (MCAS) which atomically updates a set of memory locations. This can be used to advance a data structure from one consistent state to another. The second API is a word-based software transactional memory (WSTM) which can allow sequential code to be reused more directly than with MCAS and which provides better scalability when locations are being read rather than being updated. The third API is an object-based software transactional memory (OSTM). OSTM allows a simpler implementation than WSTM, but at the cost of reengineering the code to use OSTM objects. We present practical implementations of all three of these APIs, built from operations available across all of today's major CPU families. We illustrate the use of these APIs by using them to build highly concurrent skip lists and red-black trees. We compare the performance of the resulting implementations against one another and against high-performance lock-based systems. These results demonstrate that it is possible to build useful nonblocking data structures with performance comparable to, or better than, sophisticated lock-based designs.

  • Research Article
  • Cite Count Icon 2
  • 10.7936/k7dj5cnf
Mcflow: middleware for mixed-criticality distributed real-time systems
  • Jul 8, 2013
  • Open Scholarship Institutional Repository (Washington University in St. Louis)
  • Christopher Gill + 1 more

Traditional fixed-priority scheduling analysis for periodic/sporadic task sets is based on the assumption that all tasks are equally critical to the correct operation of the system. Therefore, every task has to be schedulable under the scheduling policy, and estimates of tasks' worst case execution times must be conservative in case a task runs longer than is usual. To address the significant under-utilization of a system's resources under normal operating conditions that can arise from these assumptions, several mixed-criticality scheduling approaches have been proposed. However, to date there has been no quantitative comparison of system schedulability or run-time overhead for the different approaches. In this dissertation, we present what is to our knowledge the first side-by-side implementation and evaluation of those approaches, for periodic and sporadic mixed-criticality tasks on uniprocessor or distributed systems, under a mixed-criticality scheduling model that is common to all these approaches. To make a fair evaluation of mixed-criticality scheduling, we also address some previously open issues and propose modifications to improve schedulability and correctness of particular approaches. To facilitate the development and evaluation of mixed-criticality applications, we have designed and developed a distributed real-time middleware, called MCFlow, for mixed-criticality end-to-end tasks running on multi-core platforms. The research presented in this dissertation provides the following contributions to the state of the art in real-time middleware: (1) an efficient component model through which dependent subtask graphs can be configured flexibly for execution within a single core, across cores of a common host, or spanning multiple hosts; (2) support for optimizations to inter-component communication to reduce data copying without sacrificing the ability to execute subtasks in parallel; (3) a strict separation of timing and functional concerns so that they can be configured independently; (4) an event dispatching architecture that uses lock free algorithms where possible to reduce memory contention, CPU context switching, and priority inversion; and (5) empirical evaluations of MCFlow itself and of different mixed criticality scheduling approaches both with a single host and end-to-end across multiple hosts. The results of our evaluation show that in terms of basic distributed real-time behavior MCFlow performs comparably to the state of the art TAO real-time object request broker when only one core is used and outperforms TAO when multiple cores are involved. We also identify and categorize different use cases under which different mixed criticality scheduling approaches are preferable.

  • Book Chapter
  • 10.1007/978-3-642-25379-9_19
Verification of Scalable Synchronous Queue
  • Jan 1, 2011
  • Jinjiang Lei + 1 more

Lock-free algorithms are extremely hard to be built correct due to their fine-grained concurrency natures. Formal techniques for verifying them are crucial. We present a framework for verification of CAS-based lock-free algorithms, and prove a nontrivial lock-free algorithm Scalable Synchronous Queue that is practically adopted in Java 6. The strength of our approach lies on that it relieves the dependence on auxiliary variables/commands, thus is relatively easier to conduct and comprehend, comparing to existing works.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icise.2009.841
Practical Lock-Free Implementation of LL/SC Using Only Pointer-Size CAS
  • Jan 1, 2009
  • Hui Gao + 2 more

The significant benefit of lock (or wait)-freedom for real-time systems is that by avoiding locks the potentials for deadlock and priority inversion are avoided. The lock-free algorithms often require the use of special atomic processor instructions such as CAS (compare and swap) or LL/SC(load linked/store conditional). However, many machine architectures support either CAS or LL/SC with restricted semantics. In this paper, we present a Practical lock-free implementation of the ideal semantics of LL/SC using only pointer-size CAS. To ensure our implementation is not flawed, we used the higher-order interactive theorem prover PVS for mechanical support.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/dasc.2009.25
Verification of a Lock-Free Implementation of Multiword LL/SC Object
  • Dec 1, 2009
  • Hui Gao + 2 more

On shared memory multiprocessors, synchronization often turns out to be a performance bottleneck and the source of poor fault-tolerance. By avoiding locks, the significant benefit of lock (or wait)-freedom for real-time systems is that the potentials for deadlock and priority inversion are avoided. The lock-free algorithms often require the use of special atomic processor primitives such as CAS (Compare And Swap) or LL /SC (Load Linked/Store Conditional). However, many machine architectures support either CAS or LL /SC , but not both. In this paper, we present a lock-free implementation of the ideal semantics of LL /SC using only pointer-size CAS , and show how to use refinement mapping to prove the correctness of the algorithm.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant