Abstract

Maximum utilization of hardware resources is crucial to leverage the enormous computational power of graphics processing units (GPUs). However, there lacks an effective metric to denote whether the launched threads are kept busy. To address this issue, we propose a metric called ETU to describe the efficiency of threads utilization. First, we execute several CUDA-SDK sample codes, with(out) double precision arithmetic, on two generations of GPUs so as to perform a preliminary validation of the ETU metric. Taking the spherical harmonic transform as an example, we then give two GPU implementations for Legendre transforms and check the relationship between ETU and application performance. Experimental results show that applications with larger ETU can usually achieve better performance, which is more accurate than occupancy proposed by NVIDIA. Finally, we select the GPU implementations with better performance to accelerate Legendre transforms in STSWM, which is a spectral transform shallow water model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call