Abstract

The use of coprocessors such as GPU and FPGA is a leading trend in HPC. Therefore a lot of applications from a wide variety of domains were modified for GPUs and successfully used. In this paper, we propose an approach for prediction execution time of CUDA kernels, based on a static analysis of a program source code. The approach is based on building a CUDA core model and a graphics accelerator model. The developed method for estimating the execution time of CUDA kernels is applied to the implementation of matrix multiplication, the Fourier transform and the backpropagation method for training neural networks. As a result of verification, the approach showed good prediction accuracy, especially on low GPU loads.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call