Abstract

The critical sections with the lock protection greatly limit the concurrency of multi-threaded applications. The prior lock elision based technique is presented to exploit the parallelism between critical sections accessing the disjoint shared data, but still fails to notice and expose a high degree of concurrency between critical sections that contend for the same shared data, i.e., conflicting critical sections (CCS). This paper focuses on exploiting the CCS parallelism. The key insight of this work is that, for each running CCS, a large proportion ( $>$ 73.4%) of parallelism between CCSs can be exploited as fully as possible by simply allowing the parallel execution of their first conflict-free code fragment at runtime. We therefore present BSOptimizer, a new microarchitecture, to perform the partial reversion integrated with a series of sophisticated hardware and software strategies for the CCS parallelization. We complement the off-the-shelf cache coherency protocol to perceive the conflict location of CCS, present a predictive checkpoint mechanism to register and predict the concerned conflict point in a lightweight and accurate fashion, and redefine the traditional mutual exclusive semantics with a binary relationship. With these collaborative techniques, each CCS can be scheduled in parallel. Our experimental results on a wide variety of real programs and PARSEC benchmarks show that, compared to the native execution and two state-of-the-art lock elision techniques (including SLE and SLR), BSOptmizer can dramatically improves the performance of programs with a slight ( $ 0.8%) energy consumption and ( $ 3.9%) extra runtime overhead. Our evaluation on a micro-benchmark with software based optimization also verifies that BSOptimizer can accurately exploit the CCS parallelism as promised.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call