Abstract
SYCL is a single-source programming model for heterogeneous systems; it promises improved maintainability, productivity, and opportunity for compiler optimization, when compared to accelerator specific programming models. Several implementations of the SYCL standard have been developed over the past few years, including several backends using contemporary accelerator languages, like OpenCL, CUDA, and HIP. These implementations vary widely in their support for specific features of the standard and in their performance. As SYCL grows in popularity, developers need to know how features are implemented across popular implementations in order to make proper design choices. In this paper, we evaluate the existing SYCL implementations for important SYCL features across a range of hardware in order to understand SYCL's performance and portability. This work uses the newest SYCL benchmark suite (SYCL-Bench, 38 kernels) to evaluate these four existing implementations, comparing support of language features across backends and highlighting feature completeness and performance. For features, we focus on the five major SYCL parallel constructs, using a motivating example of the matrix multiplication benchmark. Our results show that the basic data parallelism construct is the best choice for performance on current SYCL implementations, and we identify opportunities for improvement in several of the SYCL implementations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.