Abstract

CUDA toolkits are widely used to develop applications running on NVIDIA GPUs. They include compilers and are frequently updated to integrate state-of-the-art compilation techniques. Hence, many HPC users believe that the latest CUDA toolkit will improve application performance; however, considering results from CPU compilers, there are cases where this is not true. In this paper, we thoroughly evaluate the impact of CUDA toolkit version on the performance, power consumption, and energy consumption of GPU applications with four GPU architectures. Our results show that though the latest CUDA toolkit obtains the best performance, power consumption, and energy consumption for many applications in most cases, but we found a few exceptions. For such applications, we conducted an in-depth analysis using the SASS to identify why older CUDA toolkit achieve performance improvement. Our analysis showed that the factors that caused them are by three phenomena: aggressive loop unrolling, inefficient instruction scheduling, and the impact of host compilers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call