Chapter 6 - Efficiently Using GPU Memory

doi:10.1016/b978-0-12-388426-8.00006-9

Abstract

This chapter discusses techniques and examples to efficiently use GPU memory. The importance of efficiently using GPU memory cannot be overstated. With roughly three-orders-of-magnitude difference in speed between the fastest on-chip register memory and mapped host memory that must traverse the PCIe bus, literate CUDA developers must understand the most efficient ways to use memory. Latency hiding through ILP or TLP is essential to application performance. Pre-fetching can keep more memory transactions in flight to move data to fast memory and speed even memory bandwidth- limited reduction operations. Irregular data structures are a challenge with current GPU technology, but some techniques can preserve performance even with random memory accesses. The three-orders-of-magnitude performance difference between the slowest and fastest GPU memory systems means that GPU programmers have the opportunity to capitalize on the extreme performance that GPU hardware offers. What makes CUDA so special is that it exposes the features of the underlying hardware so that the full potential of the hardware can be realized. It is possible to delve down into the lowest levels of the hardware execution model to attain high performance. Generic programming lets CUDA programmers create simple, generic methods that fully exploit the capability of the GPU. Finding more and better ways to utilize GPU memory is an area of active research as new libraries become available that support irregular data structures such as graphs and sparse matrices.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chapter 6 - Efficiently Using GPU Memory

Abstract

Talk to us

Similar Papers

More From: CUDA Application Design and Development

Lead the way for us

Similar Papers

CUDA programs for solving the time-dependent dipolar Gross–Pitaevskii equation in an anisotropic trap
Vladimir Lončar ... Sadhan K Adhikari
Computer Physics Communications | VOL. 200
Vladimir Lončar, et. al.Vladimir Lončar ... Sadhan K Adhikari
17 Dec 2015
Computer Physics Communications | VOL. 200

Generating piecewise-regular code from irregular structures
Travis Augustine ... Louis-Noël Pouchet
-
Travis Augustine, et. al.Travis Augustine ... Louis-Noël Pouchet
08 Jun 2019
08 Jun 2019

A graph based framework for the definition of tools dealing with sparse and irregular distributed data structures
S Chaumette ... J.-M Lepine
-
S Chaumette, et. al.S Chaumette ... J.-M Lepine
20 Feb 2009
20 Feb 2009

SU‐E‐J‐60: Efficient Monte Carlo Dose Calculation On CPU‐GPU Heterogeneous Systems
K Xiao ... B Zhou
Medical Physics | VOL. 41
K Xiao, et. al.K Xiao ... B Zhou
29 May 2014
Medical Physics | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chapter 6 - Efficiently Using GPU Memory

Abstract

Talk to us

Similar Papers

More From: CUDA Application Design and Development