Speculative State Research Articles

Thread-Level Speculation (TLS) overcomes limitations intrinsic with conservative compile-time auto-parallelizing tools by extracting parallel threads optimistically and only ensuring absence of data dependence violations at runtime. A significant barrier for adopting TLS (implemented in software) is the overheads associated with maintaining speculative state. Based on previous TLS limit studies, we observe that on future multicore systems we will likely have more cores idle than those which traditional TLS would be able to harness. This implies that a TLS system should focus on optimizing for small number of cores and find efficient ways to take advantage of the idle cores. Furthermore, research on optimistic systems has covered two important implementation design points: eager vs. lazy version management. With this knowledge, we propose new simple and effective techniques to reduce the execution time overheads for both of these design points. This article describes a novel compact version management data structure optimized for space overhead when using a small number of TLS threads. Furthermore, we describe two novel software runtime parallelization systems that utilize this compact data structure. The first software TLS system, MiniTLS, relies on eager memory data management (in-place updates) and, thus, when a misspeculation occurs a rollback process is required. MiniTLS takes advantage of the novel compact version management representation to parallelize the rollback process and is able to recover from misspeculation faster than existing software eager TLS systems. The second one, Lector (Lazy inspECTOR) is based on lazy version management. Since we have idle cores, the question is whether we can create “helper” tasks to determine whether speculation is actually needed without stopping or damaging the speculative execution. In Lector, for each conventional TLS thread running speculatively with lazy version management, there is associated with it a lightweight inspector . The inspector threads execute alongside to verify quickly whether data dependencies will occur. Inspector threads are generated by standard techniques for inspector/executor parallelization. We have applied both TLS systems to seven Java sequential benchmarks, including three benchmarks from SPECjvm2008. Two out of the seven benchmarks exhibit misspeculations. MiniTLS experiments report average speedups of 1.8x for 4 threads increasing close to 7x speedups with 32 threads. Facilitated by our novel compact representation, MiniTLS reduces the space overhead over state-of-the-art software TLS systems between 96% on 2 threads and 40% on 32 threads. The experiments for Lector, report average speedups of 1.7x for 2 threads (that is 1 TLS + 1 Inspector threads) increasing close to 8.2x speedups with 32 threads (16 + 16 threads). Compared to a well established software TLS baseline, Lector performs on average 1.7x faster for 32 threads and in no case ( x TLS + x Inspector threads) Lector delivers worse performance than the baseline TLS with the equivalent number of TLS threads (i.e. x TLS threads) nor doubling the equivalent number of TLS threads (i.e., x + x TLS threads).

Read full abstract

The widespread availability of multicore systems has led to an increased interest in speculative parallelization of sequential programs using software-based thread level speculation. Many of the proposed techniques are implemented via state separation where non-speculative computation state is maintained separately from the speculative state of threads performing speculative computations. If speculation is successful, the results from speculative state are committed to non-speculative state. However, upon misspeculation, discard-all scheme is employed in which speculatively computed results of a thread are discarded and the computation is performed again. While this scheme is simple to implement, one disadvantage of discard-all is its inability to tolerate high misspeculation rates due to its high runtime overhead. Thus, it is not suitable for use in applications where misspeculation rates are input dependent and therefore may reach high levels. In this paper we develop an approach for incremental recovery in which, instead of discarding all of the results and reexecuting the speculative computation in its entirety, the computation is restarted from the earliest point at which a misspeculation causing value is read. This approach has two advantages. First, the cost of recovery is reduced as only part of the computation is reexecuted. Second, since recovery takes less time, the likelihood of future misspeculations is reduced. We design and implement a strategy for implementing incremental recovery that allows results of partial computations to be efficiently saved and reused. For a set of programs where misspeculation rate is input dependent, our experiments show that with inputs that result in misspeculation rates of around 40% and 80%, applying incremental recovery technique results in 1.2x-3.3x and 2.0x-6.6x speedups respectively over the discard-all recovery scheme. Furthermore, misspeculations observed during discard-all scheme are reduced when incremental recovery is employed -- reductions range from 10% to 85%.

Read full abstract

Speculative State Research Articles

Articles published on Speculative State

IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory

Optimizing software runtime systems for speculative parallelization

Bridge Safety Quick Test Vehicle Feasibility Development Research

Enhanced speculative parallelization via incremental recovery

Supporting speculative parallelization in the presence of dynamic data structures

Speculative Parallelization of Sequential Loops on Multicores

InvisiFence

Speculative return address stack management revisited

From Speculation to Security

CMP Support for Large and Dependent Speculative Threads

Transparent control independence (TCI)

Speculative execution in a distributed file system

Exploiting reference idempotency to reduce speculative storage overflow

Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Speculative execution in a distributed file system

Housing speculation in Bangkok: lessons for emerging economies

A regime-switching approach to the study of speculative attacks: A focus on EMS crises

Precast Segmental Technology in Bangkok - 300 km of Viaducts

Reference idempotency analysis

Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speculative State Research Articles

Articles published on Speculative State

IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory

Optimizing software runtime systems for speculative parallelization

Bridge Safety Quick Test Vehicle Feasibility Development Research

Enhanced speculative parallelization via incremental recovery

Supporting speculative parallelization in the presence of dynamic data structures

Speculative Parallelization of Sequential Loops on Multicores

InvisiFence

Speculative return address stack management revisited

From Speculation to Security

CMP Support for Large and Dependent Speculative Threads

Transparent control independence (TCI)

Speculative execution in a distributed file system

Exploiting reference idempotency to reduce speculative storage overflow

Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Speculative execution in a distributed file system

Housing speculation in Bangkok: lessons for emerging economies

A regime-switching approach to the study of speculative attacks: A focus on EMS crises

Precast Segmental Technology in Bangkok - 300 km of Viaducts

Reference idempotency analysis

Architectural support for scalable speculative parallelization in shared-memory multiprocessors