Abstract

Graphics processing units (GPUs) have shown increased popularity and play an important role as kind of coprocessor in heterogeneous co-processing environment. Heavily data parallel problems can be solved efficiently due to tens of thousands threads collaborative work in parallel GPU architecture. The achieved performance, therefore, depends on the capability of multiple threads in parallel collaboration. This paper, a static analytical kernel performance model (SAKP) was proposed to estimate the execution time of GPU kernel. Especially, a set of kernel and device features for target GPU is generated in the proposed model. We determine the performance limiting factors and generate an estimation of kernel execution time with this model. Matrix multiplication (MM) and histogram generation (HG) in NVIDIA GTX680 GPU card were performed to verify our proposed model and showed an absolute error in prediction less than 6.8%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call