Abstract

The advent of computing resources with co-processors, for example Graphics Processing Units (GPU) or Field-Programmable Gate Arrays (FPGA), for use cases like the CMS High-Level Trigger (HLT) or data processing at leadership-class supercomputers imposes challenges for the current data processing frameworks. These challenges include developing a model for algorithms to offload their computations on the co-processors as well as keeping the traditional CPU busy doing other work. The CMS data processing framework, CMSSW, implements multithreading using the Intel Threading Building Blocks (TBB) library, that utilizes tasks as concurrent units of work. In this paper we will discuss a generic mechanism to interact effectively with non-CPU resources that has been implemented in CMSSW. In addition, configuring such a heterogeneous system is challenging. In CMSSW an application is configured with a configuration file written in the Python language. The algorithm types are part of the configuration. The challenge therefore is to unify the CPU and co-processor settings while allowing their implementations to be separate. We will explain how we solved these challenges while minimizing the necessary changes to the CMSSW framework. We will also discuss on a concrete example how algorithms would offload work to NVIDIA GPUs using directly the CUDA API.

Highlights

  • Co-processors or computing accelerators like graphics processing units (GPU) or fieldprogrammable gate arrays (FPGA) are becoming more and more popular to keep the cost and power consumption of computing centers under control

  • In this paper we describe generic mechanisms to interact with non-CPU resources effectively from the Threading Building Blocks (TBB) tasks (Section 2), and to configure CPU and non-CPU algorithms in a

  • As a first step to gain experience, we have explored various ways for how algorithms could offload work to NVIDIA GPUs with CUDA [17]

Read more

Summary

Introduction

Co-processors or computing accelerators like graphics processing units (GPU) or fieldprogrammable gate arrays (FPGA) are becoming more and more popular to keep the cost and power consumption of computing centers under control. The CMS data processing framework (CMSSW) [11,12,13,14,15] implements multi-threading using the Intel Threading Building Blocks (TBB) [16] library utilizing tasks as units of concurrent work. While in principle non-CPU resources could be interacted with in the TBB tasks directly in a straightforward way, the non-CPU APIs typically imply blocking the calling thread. Such blocking would lead to under-utilizing the CPU. In this paper we describe generic mechanisms to interact with non-CPU resources effectively from the TBB tasks (Section 2), and to configure CPU and non-CPU algorithms in a.

Concurrent CPU and non-CPU processing
Unified configuration for CPU and non-CPU algorithms
Pattern to interact with CUDA runtime
Asynchronous execution
Sharing of resources between modules
Minimization of data movements
Summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call