Graphics Processors in HEP Low-Level Trigger Systems

Roberto Ammendola,Davide Rossetti,Francesco Simula,Ilaria Neri,Paolo Cretaro,Ottorino Frezza,Elena Pastorelli,Francesca Lo Cicero,G Lamanna ,A Lonardo ,M Sozzi ,M Martinelli ,M Fiorini ,A Biagioni ,A Cotta Ramusino ,P Paolucci ,R Fantechi ,R Piandani ,S Di Lorenzo ,S Chiozzi ,L Pontisso ,P Vicini

doi:10.1051/epjconf/201612700011

Abstract

Usage of Graphics Processing Units (GPUs) in the so called general-purpose computing is emerging as an effective approach in several fields of science, although so far applications have been employing GPUs typically for offline computations. Taking into account the steady performance increase of GPU architectures in terms of computing power and I/O capacity, the real-time applications of these devices can thrive in high-energy physics data acquisition and trigger systems. We will examine the use of online parallel computing on GPUs for the synchronous low-level trigger, focusing on tests performed on the trigger system of the CERN NA62 experiment. To successfully integrate GPUs in such an online environment, latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Furthermore, it is assessed how specific trigger algorithms can be parallelized and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen Large Hadron Collider (LHC) luminosity upgrade where highly selective algorithms will be essential to maintain sustainable trigger rates with very high pileup.

Highlights

Usage of Graphics Processing Units (GPUs) in the so called general-purpose computing is emerging as an effective approach in several fields of science, so far applications have been employing GPUs typically for offline computations
Our approach aims at exploiting the Graphics Processing Units (GPUs) computing power, in order to build refined physics-related trigger primitives, such as energy or direction of the final state particles in the detectors, and leading to a net improvement of trigger conditions and data handling
In 2015 the GPU-based trigger at CERN includes 2 TEL62 boards connected to a HP2920 switch and a NaNet-1 [6] board with a TTC HSMC daughtercard plugged into a server made of a X9DRG-QF dual socket motherboard populated with Intel Xeon E5-2620 @2.00 GHz CPUs (i.e. Ivy Bridge architecture), 32 GB of DDR3 RAM and a Kepler-class nVIDIA K20c GPU. Such a system allows testing of the whole chain: the data events move towards the GPU-based trigger through NaNet-1 by means of the GPUDirect RDMA interface

Summary

Introduction

Standard implementation is on dedicated hardware (ASICs or FPGAs). Our approach aims at exploiting the Graphics Processing Units (GPUs) computing power, in order to build refined physics-related trigger primitives, such as energy or direction of the final state particles in the detectors, and leading to a net improvement of trigger conditions and data handling. The NaNet project goal is the design and implementation of a family of FPGA-based PCIe Network Interface Cards, equipped with different network links according to their employment, to bridge the detectors’ front-end electronics and the computing nodes [1]. To this pourpose NaNet features (i) a low-latency and high-throughput data transport mechanism for real-time systems, (ii) support for several link technologies to be able to fit in different experimental setups, and (iii) a design with multi-port interface to increase the scalability of the entire system to reduce the number of nodes in the farm clusters. NaNet-10 moves Data to CPU memory NaNet-10 moves Data to GPU memory (GPUDirect v2) NaNet-10 moves Data to GPU memory (GPUDirect RDMA) NaNet-1 moves Data to GPU memory (GPUDirect v2)

GPU-based L0 trigger for the NA62 RICH detector

Histogram algorithm

Almagest algorithm

Results for the GPUbased L0 trigger with NaNet-1