Chapter 12 - OpenCL Profiling and Debugging

Benedict Gaster,Perhaad Mistry,Lee Howes,Dana Schaa,David R. Kaeli

doi:10.1016/b978-0-12-387766-6.00035-9

Abstract

This chapter discusses the OpenCL profiling and debugging. OpenCL is not limited to writing isolated high-performance kernels but can also speed up parallel applications. This chapter discusses how one can optimize kernels running on OpenCL devices by targeting features of the architecture, and how one can study the interaction between the computational kernels on the device and the host. One needs to measure the performance and study an application as a whole to understand bottlenecks. An OpenCL application can include kernels and a large amount of input/output (IO) between the host and device. OpenCL API provides some basic features for application profiling and how operating system APIs can be used for timing sections of code. Debugging of parallel programs is traditionally more complicated than conventional serial code due to subtle bugs such as race conditions, which are difficult to detect and reproduce.

Full Text