Abstract

This chapter discusses the OpenCL profiling and debugging. OpenCL is not limited to writing isolated high-performance kernels but can also speed up parallel applications. This chapter discusses how one can optimize kernels running on OpenCL devices by targeting features of the architecture, and how one can study the interaction between the computational kernels on the device and the host. One needs to measure the performance and study an application as a whole to understand bottlenecks. An OpenCL application can include kernels and a large amount of input/output (IO) between the host and device. OpenCL API provides some basic features for application profiling and how operating system APIs can be used for timing sections of code. Debugging of parallel programs is traditionally more complicated than conventional serial code due to subtle bugs such as race conditions, which are difficult to detect and reproduce.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.