Abstract

Improving sequential performance of out-of-order processors is becoming harder. Further improvements may require exploitation of thread-level parallelism, on top of ILP, as it can provide better design and performance scaling. Unfortunately, previous “speculative multithreading” approaches have shown small gains and/or incur a high cost, particularly for general-purpose, non-numeric applications. This paper investigates the fundamental limits to sequential performance scaling through speculative multithreading - we present an LLVM compiler-driven limit study framework that investigates the limits of loop-level parallelism at run-time. This new study of loop-level parallelism demonstrates the potential for up to 4.6x and 7.2x geometric mean speedup on SpecINT2000 and SpecINT2006. Thanks to the additional consideration of recent parallelization schemes, such as generalized DOACROSS (HELIX), these potential speedups are higher than reported by previous state-of-the-art limit studies. Our analysis further categorizes the various inter-thread dependencies and ordering constraints with respect to the specific architectural choices and techniques each would require for implementation. We then evaluate the relative importance of each such constraint for different application (benchmark) types, and provide insight into the cost/benefit trade-offs when designing systems for efficiently implementing speculative multithreading. Such insights should help the design of bespoke systems for speculative multithreading while achieving better speedups, efficiency, and scaling, relative to typical approaches which, thus far, have relied upon adapting conventional multi-core systems. Our analysis further categorizes the various inter-thread dependencies and ordering constraints with respect to the specific architectural choices and techniques each would require for implementation. We then evaluate the relative importance of each such constraint for different application (benchmark) types, and provide insight into the cost/benefit trade-offs when designing systems for efficiently implementing speculative multithreading. Such insights should help the design of bespoke systems for speculative multithreading while achieving better speedups, efficiency, and scaling, relative to typical approaches which, thus far, have relied upon adapting conventional multi-core systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.