Abstract

The increasing uptake of portable, parallel programming models such as OpenCL has fueled extensive research into performance portability. Automatic performance tuning techniques have shown promise for generating kernels which are highly optimized for specific architectures, but do not address the issue of performance portability directly. With the range of architectures and possible optimizations continuously growing, the concept of achieving performance portability from a single code base becomes ever more attractive.In this talk, we present an approach for analyzing performance portability that exploits that black-box nature of automatic performance tuning techniques. We demonstrate this approach across a diverse range of GPU and CPU architectures for two simple OpenCL applications. We then discuss the potential for auto-tuning to aid the generation of performance portable OpenCL kernels by incorporating multi-objective optimization techniques into the tuning process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call