Optimizing CUDA code by kernel fusion: application on BLAS

Jiří Filipovič,Jan Fousek,Matúš Madzin,Luděk Matyska

doi:10.1007/s11227-015-1483-z

Optimizing CUDA code by kernel fusion: application on BLAS

Jiří Filipovič, Jan Fousek + Show 2 more

Open Access

https://doi.org/10.1007/s11227-015-1483-z

Copy DOI

Journal: The Journal of Supercomputing	Publication Date: Jul 22, 2015
Citations: 77

Affiliation: Masaryk University, Czech Academy of Sciences, Institute of Computer Science

#Kernel Fusion #GPU Kernels + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Modern GPUs are able to perform significantly more arithmetic operations than transfers of a single word to or from global memory. Hence, many GPU kernels are limited by memory bandwidth and cannot exploit the arithmetic power of GPUs. However, the memory locality can be often improved by kernel fusion when a sequence of kernels is executed and some kernels in this sequence share data. In this paper, we show how kernels performing map, reduce or their nested combinations can be fused automatically by our source-to-source compiler. To demonstrate the usability of the compiler, we have implemented several BLAS-1 and BLAS-2 routines and show how the performance of their sequences can be improved by fusions. Compared to similar sequences using CUBLAS, our compiler is able to generate code that is up to 2.61x faster for the examples tested.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: The Journal of Supercomputing

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.