Abstract

The Portable Computing Language (PoCL) is a vendor independent open-source OpenCL implementation that aims to support a variety of compute devices in a single platform. Evaluating PoCL versus the Intel OpenCL implementation reveals significant performance drawbacks of PoCL on Intel CPUs – which run 92 % of the TOP500 list. Using a selection of benchmarks, we identify and analyse performance issues in PoCL with a focus on scheduling and vectorisation. We propose a new CPU device-driver based on Intel Threading Building Blocks (TBB), and evaluate LLVM with respect to automatic compiler vectorisation across work-items in PoCL. Using the TBB driver, it is possible to narrow the gap to Intel OpenCL and even outperform it by a factor of up to 1.3 × in our proxy application benchmark with a manual vectorisation strategy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.