Chapter 36 - Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads

Mark Silberstein,Assaf Schuster,John D Owens

doi:10.1016/b978-0-12-385963-1.00036-8

Abstract

This chapter covers two difficult problems frequently encountered by graphics processing unit (GPU) developers—optimizing memory access for kernels with complex input-dependent access patterns, and mapping the computations to a GPU or a CPU in composite applications with multiple dependent kernels. Both pose a formidable challenge, as they require dynamic adaptation and tuning of execution policies to allow high performance for a wide range of inputs. Not meeting these requirements leads to substantial performance penalty. This chapter describes the methodology for solving the memory optimization problem via softwaremanaged caching by efficiently exploiting the fast scratchpad memory. This technique outperforms the cache-less and the texture memory-based approaches on pre-Fermi GPU architectures as well as on the one that uses the Fermi hardware cache alone. It then presents the algorithm for minimizing the total running time of a complete application comprising multiple interdependent kernels. Both a GPU and a CPU can be used to execute the kernels, but the performance varies greatly for different inputs, calling for dynamic assignment of the computations to a GPU or a CPU at runtime. The communication overhead due to the data dependencies between the kernels makes per-kernel greedy selection of the best performing device suboptimal. The algorithm optimizes the runtime of the complete application by evaluating the performance of all the assignments jointly, including the overhead of the data transfers between the devices.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chapter 36 - Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads

Abstract

Talk to us

Similar Papers

More From: GPU Computing Gems Jade Edition

Lead the way for us

Similar Papers

Accelerating genetic algorithms with GPU computing: A selective overview
John Runwei Cheng ... Mitsuo Gen
Computers & Industrial Engineering | VOL. 128
John Runwei Cheng, et. al.John Runwei Cheng ... Mitsuo Gen
29 Dec 2018
Computers & Industrial Engineering | VOL. 128

Decoupling the programming model from resource management in throughput processors
...
-
, et. al. ...
03 Jun 2019
03 Jun 2019

General Purpose Computation on Graphics Processing Units Using OpenCL

-

01 Jan 2013
01 Jan 2013

Reduction of computing time for seismic applications based on the Helmholtz equation by Graphics Processing Units

-

03 Mar 2015
03 Mar 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chapter 36 - Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads

Abstract

Talk to us

Similar Papers

More From: GPU Computing Gems Jade Edition