Performance of CUDA Unified Memory in CMS Heterogeneous Pixel Reconstruction

Matti J Kortelainen,Martin Kwok,C Biscarat,G.A Stewart,S Roiser,S Campana,B Hegner,C.I Rovelli

doi:10.1051/epjconf/202125103035

Matti J Kortelainen, Martin Kwok + Show 6 more

Open Access

https://doi.org/10.1051/epjconf/202125103035

Copy DOI

Abstract

The management of separate memory spaces of CPUs and GPUs brings an additional burden to the development of software for GPUs. To help with this, CUDA unified memory provides a single address space that can be accessed from both CPU and GPU. The automatic data transfer mechanism is based on page faults generated by the memory accesses. This mechanism has a performance cost, that can be with explicit memory prefetch requests. Various hints on the inteded usage of the memory regions can also be given to further improve the performance. The overall effect of unified memory compared to an explicit memory management can depend heavily on the application. In this paper we evaluate the performance impact of CUDA unified memory using the heterogeneous pixel reconstruction code from the CMS experiment as a realistic use case of a GPU-targeting HEP reconstruction software. We also compare the programming model using CUDA unified memory to the explicit management of separate CPU and GPU memory spaces.

Highlights

Graphics Processing Units (GPUs) are commonly used to accelerate scientific computing because of their cost and power efficiency in solving many data-parallel problems
The algorithms are organized in five CMS data processing software (CMSSW) framework modules, depicted in Figure 1 as a directed acyclic graph (DAG) by their data dependencies, that communicate the intermediate data in the device memory through the CMSSW event data
We identify two use cases where programming would be simpler than with explicit memory: for data to be transferred to many devices, and for data structures that heavily use pointers to refer to other locations in the unified memory space

Summary

Introduction

Graphics Processing Units (GPUs) are commonly used to accelerate scientific computing because of their cost and power efficiency in solving many data-parallel problems. Their programming model introduces a concept of separate memory spaces between the host (CPU) and devices (GPUs). We evaluate the performance impact of the CUDA unified memory compared to manage the separate host and device memory spaces explicitly.

Structure of the pixel reconstruction application

Use of CUDA Unified Memory

Performance measurements and results

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance of CUDA Unified Memory in CMS Heterogeneous Pixel Reconstruction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

Single address space or private address spaces?
Jacques Mossière ... Xavier Rousset De Pina
-
Jacques Mossière, et. al.Jacques Mossière ... Xavier Rousset De Pina
12 Sep 1994
12 Sep 1994

XUnified: A Framework for Guiding Optimal Use of GPU Unified Memory
Hailu Xu ... Murali Emani
IEEE Access | VOL. 10
Hailu Xu, et. al.Hailu Xu ... Murali Emani
01 Jan 2021
IEEE Access | VOL. 10

Compiler assisted hybrid implicit and explicit GPU memory management under unified address space
Lingda Li ... Barbara Chapman
-
Lingda Li, et. al.Lingda Li ... Barbara Chapman
17 Nov 2019
17 Nov 2019

Multiple and single address spaces: towards a middle ground

-

09 Dec 1993
09 Dec 1993

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance of CUDA Unified Memory in CMS Heterogeneous Pixel Reconstruction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences