Improving performance of SYCL applications on CPU architectures using LLVM‐directed compilation flow

Pietro Ghiglio,Kumudha Narasimhan,Uwe Dolinsky,Mehdi Goli

doi:10.1002/cpe.7810

Abstract

SummaryThe wide adoption of SYCL as an open‐standard API for accelerating C++ software in domains such as HPC, automotive, artificial intelligence, machine learning, and other areas necessitates efficient compiler and runtime support for a growing number of different platforms. Existing SYCL implementations provide support for various devices like CPUs, GPUs, DSPs, FPGAs and so forth, typically via OpenCL or CUDA backends. While accelerators have increased the performance of user applications significantly, employing CPU devices for further performance improvement is beneficial due to the significant presence of CPUs in existing data‐centers. SYCL applications on CPUs, currently go through an OpenCL backend. Though an OpenCL backend is valuable in supporting accelerators, it may introduce additional overhead for CPUs since the host and device are the same. Overheads like a run‐time compilation of the kernel, transferring of input/output memory to/from the OpenCL device, invoking the OpenCL kernel and so forth, may not be necessary when running on the CPU. While some of these overheads (such as data transfer) can be avoided by modifying the application, it can introduce disparity in the SYCL application's ability to achieve performance portability on other devices. In this article, we propose an alternate approach to running SYCL applications on CPUs. We bypass OpenCL and use a CPU‐directed compilation flow, along with the integration of whole function vectorization to generate optimized host and device code together in the same translation unit. We compare the performance of our approach—the CPU‐directed compilation flow, with an OpenCL backend for existing SYCL‐based applications, with no code modification for BabelStream benchmark, Matmul from the ComputeCpp SDK, N‐body simulation benchmarks and SYCL‐BLAS (Aliaga et al. Proceedings of the 5th International Workshop on OpenCL; 2017.), on CPUs from different vendors and architectures. We report a performance improvement of up to on BabelStream benchmarks, up to on Matmul, up to on the N‐body simulation benchmark and up to 16% on SYCL‐BLAS.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving performance of SYCL applications on CPU architectures using LLVM‐directed compilation flow

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Similar Papers

Enhancing theWeb WithAdvanced Engineering
Juan Carlos Preciado ... Juan Hernández
Journal of Web Engineering | VOL. 17
Juan Carlos Preciado, et. al.Juan Carlos Preciado ... Juan Hernández
01 Jan 2019
Journal of Web Engineering | VOL. 17

CORR Synthesis: When Should the Orthopaedic Surgeon Use Artificial Intelligence, Machine Learning, and Deep Learning?
Michael P Murphy ... Nicholas M Brown
Clinical orthopaedics and related research | VOL. 479
Michael P Murphy, et. al.Michael P Murphy ... Nicholas M Brown
17 Feb 2021
Clinical orthopaedics and related research | VOL. 479

Artificial intelligence: Friend or foe?
Anusch Yazdani ... Sam Costa
Australian and New Zealand Journal of Obstetrics and Gynaecology | VOL. 63
Anusch Yazdani, et. al.Anusch Yazdani ... Sam Costa
01 Apr 2023
Australian and New Zealand Journal of Obstetrics and Gynaecology | VOL. 63

Can Artificial Intelligence (AI) assist in the diagnosis of oral mucosal lesions and/or oral cancer?
Antonia Kolokythas
Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology | VOL. 134
Antonia KolokythasAntonia Kolokythas
15 Jul 2022
Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology | VOL. 134

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving performance of SYCL applications on CPU architectures using LLVM‐directed compilation flow

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience