On Thin Air Reads: Towards an Event Structures Model of Relaxed Memory

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

To model relaxed memory, we propose confusion-free event structures over an alphabet with a justification relation. Executions are modeled by justified configurations, where every read event has a justifying write event. Justification alone is too weak a criterion, since it allows cycles of the kind that result in so-called thin-air reads. Acyclic justification forbids such cycles, but also invalidates event reorderings that result from compiler optimizations and dynamic instruction scheduling. We propose the notion of well-justification, based on a game-like model, which strikes a middle ground. We show that well-justified configurations satisfy the DRF theorem: in any data-race free program, all well-justified configurations are sequentially consistent. We also show that rely-guarantee reasoning is sound for well-justified configurations, but not for justified configurations. For example, well-justified configurations are type-safe. Well-justification allows many, but not all reorderings performed by relaxed memory. In particular, it fails to validate the commutation of independent reads. We discuss variations that may address these shortcomings.

Similar Papers
  • Conference Article
  • Cite Count Icon 62
  • 10.1145/2933575.2934536
On Thin Air Reads Towards an Event Structures Model of Relaxed Memory
  • Jul 5, 2016
  • Alan Jeffrey + 1 more

This is the first paper to propose a pure event structures model of relaxed memory. We propose confusion-free event structures over an alphabet with a justification relation as a model. Executions are modeled by justified configurations, where every read event has a justifying write event. Justification alone is too weak a criterion, since it allows cycles of the kind that result in so-called thin-air reads. Acyclic justification forbids such cycles, but also invalidates event reorderings that result from compiler optimizations and dynamic instruction scheduling. We propose a notion well-justification, based on a game-like model, which strikes a middle ground. We show that well-justified configurations satisfy the DRF theorem: in any data-race free program, all well-justified configurations are sequentially consistent. We also show that rely-guarantee reasoning is sound for well-justified configurations, but not for justified configurations. For example, well-justified configurations are type-safe. Well-justification allows many, but not all reorderings performed by relaxed memory. In particular, it fails to validate the commutation of independent reads. We discuss variations that may address these shortcomings.

  • Research Article
  • Cite Count Icon 1
  • 10.1023/a:1008125919892
Aggressive Dynamic Execution of Decoded Traces
  • Aug 1, 1999
  • Journal of VLSI signal processing systems for signal, image and video technology
  • Benjamin Bishop + 3 more

We consider the increased performance that can be obtained by using in concert, three previously proposed (and in two cases used in commercial systems) ideas. These ideas are aggressive dynamic (run time) instruction scheduling, reuse of decoded instructions, and trace scheduling. We show that these ideas complement and support one another. Hence, while each of these ideas has been shown to have merit in its own right, when used in concert, we claim the overall advantage is greater than that obtained by using any one singly. To support this claim, we present the results from running several common multimedia kernels. Overall, these results show an average speedup of 3.50 times what can be had by using dynamic instruction scheduling alone.

  • Research Article
  • Cite Count Icon 88
  • 10.1109/2.30730
Dynamic instruction scheduling and the Astronautics ZS-1
  • Jul 1, 1989
  • Computer
  • J.E Smith

An overview of and survey solutions to the problem of instruction scheduling for pipelined computers are provided. The author demonstrated that dynamic instruction scheduling can provide performance improvements not possible with static scheduling alone. He describes a high-performance computer, the Astronautics ZS-1, which uses novel methods for implementing dynamic scheduling and which can outperform computers using similar-speed technologies that rely solely on state-of-the-art static scheduling techniques. >

  • Research Article
  • 10.5075/epfl-thesis-4541
Formalizing and Verifying Transactional Memories
  • Jan 1, 2010
  • Infoscience (Ecole Polytechnique Fédérale de Lausanne)
  • Vasu Singh

Transactional memory (TM) has shown potential to simplify the task of writing concurrent programs. TM shifts the burden of managing concurrency from the programmer to the TM algorithm. The correctness of TM algorithms is generally proved manually. The goal of this thesis is to provide the mathematical and software tools to automatically verify TM algorithms under realistic memory models. Our first contribution is to develop a mathematical framework to capture the behavior of TM algorithms and the required correctness properties. We consider the safety property of opacity and the liveness properties of obstruction freedom and livelock freedom. We build a specification language of opacity. We build a framework to express hardware relaxed memory models. We develop a new high-level language, Relaxed Memory Language (RML), for expressing concurrent algorithms with a hardware-level atomicity of instructions, whose semantics is parametrized by various relaxed memory models. We express TM algorithms like TL2, DSTM, and McRT STM in our framework. The verification of TM algorithms is difficult because of the unbounded number, length, and delay of concurrent transactions and the unbounded size of the memory. The second contribution of the thesis is to identify structural properties of TM algorithms which allow us to reduce the unbounded verification problem to a language-inclusion check between two finite state systems. We show that common TM algorithms satisfy these structural properties. The third contribution of the thesis is our tool FOIL for model checking TM algorithms. FOIL takes as input the RML description of a TM algorithm and the description of a memory model. FOIL uses the operational semantics of RML to compute the language of the TM algorithm for two threads and two variables. FOIL then checks whether the language of the TM algorithm is included in the specification language of opacity. FOIL automatically determines the locations of fences, which if inserted, ensure the correctness of the TM algorithm under the given memory model. We use FOIL to verify DSTM, TL2, and McRT STM under the memory models of sequential consistency, total store order, partial store order, and relaxed memory order.

  • Conference Article
  • 10.1109/indicon.2017.8487498
Reconfigurable Dynamic Scheduling In Superscalar Processor for FIR Filter
  • Dec 1, 2017
  • S Ramya + 1 more

A typical superscalar processor fetches, decodes and executes several instructions. The incoming instruction stream is then analyzed for data dependencies and resource dependencies. Instructions are distributed to functional units based on availability of functional unit and data by the dispatcher. This is referred as dynamic instruction scheduling. This paper proposes a dynamic scheduling for the superscalar processor that consists of four functional units, instruction analyzer window of 8 instructions, instruction decoder and dispatcher with register bank. Four independent out of order instructions are executed in parallel. To improve the performance of the processor in terms of speed Tomasulo algorithm is implemented using Isim simulator in Xilinx 14.5 version. To demonstrate potential of the architecture, FIR filter is implemented and compared in terms of execution time with and without dynamic scheduling and also with respect to scalar processor architecture.

  • Conference Article
  • Cite Count Icon 1
  • 10.5753/sbac-pad.1999.19788
Investigating the Relative Performance of Static and Dynamic Instruction Scheduling
  • Sep 29, 1999
  • Daniel Tate + 2 more

There are two distinct groups of research into ILP. Those that strongly favour static instruction scheduling and those that favour dynamic instruction scheduling. This paper introduces powerful static and dynamic scheduling models and combines them within the framework of a single simulation environment. Both individual models achieve respectable speedups; dynamic schedullng significantly out-performs static scheduling when an idealised processor model with perfect branch prediction is used. However, when a realistic branch predictor is substituted, the roles are reversed, and static scheduling achieves the higher performance. Similarly, static scheduling performs better in the absence of branch prediction or when processor resources are restricted. Finally, we combine static scheduling with out-of-order instruction issue. Disappointingly, when an ideal out-of-order processor is used, scheduled code fails to match the performance of unscheduled code. Furthermore, with realistic branch predictlon, out-of-order issue fails to improve the performance of scheduled code.

  • Research Article
  • Cite Count Icon 5
  • 10.1145/325096.325140
An investigation of static versus dynamic scheduling
  • May 1, 1990
  • ACM SIGARCH Computer Architecture News
  • Carl E Love + 1 more

article Free Access Share on An investigation of static versus dynamic scheduling Authors: Carl E. Love University of Colorado at Boulder, 2505 Table Mesa Dr. Boulder, Colorado University of Colorado at Boulder, 2505 Table Mesa Dr. Boulder, ColoradoView Profile , Harry F. Jordan University of Colorado at Boulder, Dept. of Electrical Engineering, Campus Box 425 Boulder, Colorado University of Colorado at Boulder, Dept. of Electrical Engineering, Campus Box 425 Boulder, ColoradoView Profile Authors Info & Claims ACM SIGARCH Computer Architecture NewsVolume 18Issue 2SIJune 1990 pp 192–201https://doi.org/10.1145/325096.325140Published:01 May 1990Publication History 6citation450DownloadsMetricsTotal Citations6Total Downloads450Last 12 Months23Last 6 weeks7 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my Alerts New Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/iccse.2016.7581696
A lightweight instruction-set simulator for teaching of dynamic instruction scheduling
  • Aug 1, 2016
  • Wen-Jie Liu + 2 more

Extensive use of dynamic instruction scheduling technique has made it an essential content of Computer Architecture (CA) course. Practical teaching for this content, however, is always a weak link in the teaching of CA. According to current teaching methods, teachers just explain the principle of dynamic instruction scheduling by traditional or multimedia instruction. This kind of method is far from effective for students to understand dynamic scheduling technique. Therefore, it is necessary to adopt experiment-based methods. In this paper, we propose a lightweight simulator framework for the teaching of CA, especially for pipelining and dynamic instruction scheduling. We firstly design a basic simulator called PipelineSim which supports a basic five-stage MIPS pipeline. Scoreboarding and Tomasulo are then introduced to be integrated into PipelineSim. Students can implement either Scoreboarding or Tomasulo algorithm based on this framework instead of just understanding these two mechanisms by lectures. We also provide an example of designed experiment, which can be the reference or teaching resource for teachers to use.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/async.2001.914078
An asynchronous superscalar architecture for exploiting instruction-level parallelism
  • Mar 11, 2001
  • T Werner + 1 more

This paper proposes an asynchronous superscalar architecture called DCAP to exploit instruction-level parallelism based on a novel dynamic instruction scheduling technique. The proposed technique not only has an efficient implementation using asynchronous micropipelines, it also minimizes the amount of hardware required for instruction scheduling when compared to standard schemes used in synchronous superscalar processors. In addition, the proposed technique for dynamic instruction scheduling also exploits the dependency patterns in the instruction streams for enhanced performance. DCAP is a fully functional model of an asynchronous superscalar processor and supports register renaming and precise interrupts. A detailed performance analysis of DCAP on realistic benchmarks is presented.

  • Book Chapter
  • Cite Count Icon 38
  • 10.1007/978-3-642-22306-8_10
A Verification-Based Approach to Memory Fence Insertion in Relaxed Memory Systems
  • Jan 1, 2011
  • Alexander Linden + 1 more

This paper addresses the problem of verifying and correcting programs when they are moved from a sequential consistency execution environment to a relaxed memory context. Specifically, it considers the TSO (Total Store Order) relaxation, which corresponds to the use of store buffers, and its extension x86-TSO, which in addition allows synchronization and lock operations.The proposed approach uses a previously developed verification tool that uses finite automata to symbolically represent the possible contents of the store buffers. Its starting point is a program that is correct for the usual sequential consistency memory model, but that might be incorrect under x86-TSO. This program is then analyzed for this relaxed memory model and when errors are found (with respect to safety properties), memory fences are inserted in order to avoid these errors. The approach proceeds iteratively and heuristically, inserting memory fences until correctness is obtained, which is guaranteed to happen.An advantage of our technique is that the underlying symbolic verification tool makes a full exploration possible even for cyclic programs, which makes our approach broadly applicable. The method has been tested with an experimental implementation and can effectively handle a series of classical examples.KeywordsShared MemoryMemory ModelConcurrent ProgramSequential ConsistencyLoad OperationThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

  • Book Chapter
  • 10.1007/978-3-642-39304-4_2
Instruction Scheduling in Microprocessors
  • Jan 1, 2013
  • Gürhan Küçük + 2 more

The Central Processing Unit (CPU) in a microprocessor is responsible for running machine instructions as fast as possible so that the machine performance is at its maximum level. While simple in design, in-order execution processors provide sub-optimal performance, because any delay in instruction processing blocks the entire instruction stream. To overcome this limitation, modern highperformance designs use out-of-order (OoO) instruction scheduling to better exploit available Instruction-Level Parallelism (ILP), and both static (compilerassisted) and dynamic (hardware-assisted) scheduling solutions are possible. The hardware-assisted scheduling integrates an OoO core that requires a complex dynamic instruction scheduler and additional datapath structures are utilized to hold the in-flight instructions in program order to support the reconstruction of precise program state. The logic becomes even more complex when superscalar (those capable of executing multiple instructions every clock cycle) designs are used. This chapter gives a brief introduction to instruction scheduling on pipelined superscalar architectures, and, then, explains some of the keystone static and dynamic instruction scheduling algorithms.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-319-05119-2_15
Studying Operational Models of Relaxed Concurrency
  • Jan 1, 2014
  • Gustavo Petri

We study two operational semantics for relaxed memory models. Our first formalization is based on the notion of write-buffers which is pervasive in the memory models literature. We instantiate the Total Store Ordering TSO and Partial Store Ordering PSO memory models in this framework. Memory models that support more aggressive relaxations e.g. read-to-read reordering are not easily described with write-buffers. Our second framework is based on a general notion of speculative computation. In particular we allow the prediction of function arguments, and execution ahead of time e.g. by branch prediction. While technically more involved than write-buffers, this model is more expressive and can encode all the Sparc family of memory models: TSO, PSO and Relaxed Memory Ordering RMO. We validate the adequacy of our instantiations of TSO and PSO by formally comparing their write-buffer and speculative formalizations. The use of operational semantics techniques is paramount for the tractability of these proofs.

  • Research Article
  • Cite Count Icon 12
  • 10.1145/3158106
Transactions in relaxed memory architectures
  • Dec 27, 2017
  • Proceedings of the ACM on Programming Languages
  • Brijesh Dongol + 2 more

The integration of transactions into hardware relaxed memory architectures is a topic of current research both in industry and academia. In this paper, we provide a general architectural framework for the introduction of transactions into models of relaxed memory in hardware, including the SC, TSO, ARMv8 and PPC models. Our framework incorporates flexible and expressive forms of transaction aborts and execution that have hitherto been in the realm of software transactional memory. In contrast to software transactional memory, we account for the characteristics of relaxed memory as a restricted form of distributed system, without a notion of global time. We prove abstraction theorems to demonstrate that the programmer API matches the intuitions and expectations about transactions.

  • Conference Article
  • 10.1109/async.1996.494439
Counterflow pipeline based dynamic instruction scheduling
  • Mar 18, 1996
  • T Werner + 1 more

This paper proposes a new dynamic instruction scheduler called the Asynchronous Fast Dispatch Stack (AFDS). This approach utilizes asynchronous design techniques to implement a dispatch stack-based dynamic instruction issue mechanism. To maintain throughput and simplify dependency computations, the AFDS architecture includes a counterflow pipeline, which is modeled after the Counterflow Pipeline Processor (CFPP) proposed by Sproull and Sutherland (1994). The AFDS counterflow pipeline, however, propagates instruction dependency and completion information, rather than results and source operands. Preliminary results indicate that the AFDS is a promising application of the CFPP architecture.

  • Research Article
  • 10.1007/s10703-011-0131-3
Verification of STM on relaxed memory models
  • Nov 23, 2011
  • Formal Methods in System Design
  • Rachid Guerraoui + 2 more

Software transactional memories (STM) are described in the literature with assumptions of sequentially consistent program execution and atomicity of high level operations like read, write, and abort. However, in a realistic setting, processors use relaxed memory models to optimize hardware performance. Moreover, the atomicity of operations depends on the underlying hardware. This paper presents the first approach to verify STMs under relaxed memory models with atomicity of 32 bit loads and stores, and read-modify-write operations. We describe RML, a simple language for expressing concurrent programs. We develop a semantics of RML parametrized by a relaxed memory model. We then present our tool, FOIL, which takes as input the RML description of an STM algorithm restricted to two threads and two variables, and the description of a memory model, and automatically determines the locations of fences, which if inserted, ensure the correctness of the restricted STM algorithm under the given memory model. We use FOIL to verify DSTM, TL2, and McRT STM under the memory models of sequential consistency, total store order, partial store order, and relaxed memory order for two threads and two variables. Finally, we extend the verification results for DSTM and TL2 to an arbitrary number of threads and variables by manually proving that the structural properties of STMs are satisfied at the hardware level of atomicity under the considered relaxed memory models.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface