The Realm of Graphical Processing Unit (GPU) Computing

Vivek K Pallipuram,Jinzhu Gao

doi:10.1007/978-3-319-93109-8_8

Abstract

The goal of the chapter is to introduce the upper-level Computer Engineering/Computer Science undergraduate (UG) students to general-purpose graphical processing unit (GPGPU) computing. The specific focus of the chapter is on GPGPU computing using the Compute Unified Device Architecture (CUDA) C framework due to the following three reasons: (1) Nvidia GPUs are ubiquitous in high-performance computing, (2) CUDA is relatively easy to understand versus OpenCL, especially for UG students with limited heterogeneous device programming experience, and (3) CUDA experience simplifies learning OpenCL and OpenACC. The chapter consists of nine pedagogical sections with several active-learning exercises to effectively engage students with the text. The chapter opens with an introduction to GPGPU computing. The chapter sections include: (1) Data parallelism; (2) CUDA program structure; (3) CUDA compilation flow; (4) CUDA thread organization; (5) Kernel: Execution configuration and kernel structure; (6) CUDA memory organization; (7) CUDA optimizations; (8) Case study: Image convolution on GPUs; and (9) GPU computing: The future. The authors believe that the chapter layout facilitates effective student-learning by starting from the basics of GPGPU computing and then leading up to the advanced concepts. With this chapter, the authors aim to equip students with the necessary skills to undertake graduate-level courses on GPU programming and make a strong start with undergraduate research.

Full Text