Abstract

Abstract Understanding bottlenecks in parallel programs is critical to designing more efficient and performant multi-core architectures. Synchronization is a prime example of a potential bottleneck, but is a necessary evil when writing parallel programs; we must enforce correct access to shared data. Even the most expert programmers may find synchronization to be a significant overhead in their application. Techniques to mitigate synchronization overhead include speculative lock elision, faster hardware barriers, and load balancing via dynamic voltage and frequency scaling. A key insight is that the timing of synchronization events, impacted not only by the progress of the current thread but also others, is fundamental to an application’s performance. To enable a better understanding of multithreaded applications, we introduce a new level of abstraction for multi-core evaluation and propose an analytical model centered around the timing and ordering of synchronization events. Our model allows research across the stack to evaluate the performance of applications on future, non-existent systems and architectures. Compared to real hardware, our model estimates performance with a geometric average of 7.2% error across thirteen benchmarks and can generate performance characteristics per thread in less than a minute on average for very large (native) inputs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call