Abstract

The performance of data-intensive applications is often limited not only by the computational power of current computers but also by the performance gap between the CPU and the main system memory. Data prefetch mechanisms mask this latency by moving data closer to the CPU automatically. These methods rely on predicting future memory addresses; however, they are not suited for applications with random memory access patterns. Preexecution is a prefetch method which executes a slice of the original algorithm in parallel with the main thread to calculate memory addresses and issue loads early. In this paper we propose a lightweight software preexecution strategy for data parallel applications that accelerates the main working thread with an adaptive preexecution helper thread acting as a perfect predictor and consuming cache misses. With automatic parameter tuning the helper thread adapts to the application and system it is executed on. This method was able to achieve an average speedup of 10–30% in a real-life data parallel application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call