Abstract
In light of GPUs’ powerful floating-point operation capacity, heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC). However, due to the complexity of programming on GPUs, porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big challenge. The OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific computing. To effectively inherit existing OpenMP applications and reduce the transplant cost, we extend OpenMP with a group of compiler directives, which explicitly divide tasks among the CPU and the GPU, and map time-consuming computing fragments to run on the GPU, thus dramatically simplifying the transplantation. We have designed and implemented MPtoStream, a compiler of the extended OpenMP for AMD’s stream processing GPUs. Our experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system, incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU, over the execution on the Xeon CPU alone.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have