Abstract

To obtain significant execution speedups, GPUs rely heavily on the inherent data-level parallelism present in the targeted application. However, application programs may not always be able to fully utilize these parallel computing resources due to intrinsic data dependencies or complex data pointer operations. In this paper, we explore how to leverage aggressive software-based value prediction techniques on a GPU to accelerate programs that lack inherent data parallelism. This class of applications are typically difficult to map to parallel architectures due to the presence of data dependencies and complex data pointer manipulation present in these applications. Our experimental results show that, despite the overhead incurred due to software speculation and the communication overhead between the CPU and GPU, we obtain up to 6.5 $$\times $$ speedup on a selected set of kernels taken from the SPEC CPU2006, PARSEC and Sequoia benchmark suites.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call