Abstract
In this paper, we propose a source-to-source code optimization framework for general purpose computing on graphics processing units (GPGPU). Our framework is based on a re-formulation of the polyhedral loop transformation theory under the context of GPGPU. We prove that the number of actual memory transactions can be used as a performance metric to guide the code optimization process. In addition, we show how to analytically derive such a metric from a GPU program's polyhedral model. We also develop formations of GPGPU-specific optimization problems and propose corresponding affine transformations, which can be applied to an initial parallelized solution derived from input C/C++ code. The experiment results demonstrate the effectiveness of our work. On average, the code generated by our work outperforms a leading GPGPU compiler and NVIDIA handcrafted CUBLAS 4.0 by 20% and 17%, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.