Evaluating Kubernetes as an orchestrator of the Event Filter computing farm of the Trigger and Data Acquisition system of the ATLAS experiment at the Large Hadron Collider

Giuseppe Avolio,Reiner Hauser,Mattia Cadeddu

doi:10.1051/epjconf/201921407024

Giuseppe Avolio, Reiner Hauser + Show 1 more

Open Access

PDF Available

https://doi.org/10.1051/epjconf/201921407024

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The ATLAS experiment at the LHC relies on a complex and distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data. The Event Filter (EF) component of the TDAQ system is responsible for executing advanced selection algorithms, reducing the data rate to a level suitable for recording to permanent storage. The EF functionality is provided by a computing farm made up of thousands of commodity servers, each executing one or more processes. Moving the EF farm management towards a solution based on software containers is one of the main themes of the ATLAS TDAQ Phase-II upgrades in the area of the online software; it would make it possible to open new possibilities for fault tolerance, reliability and scalability. This paper presents the results of an evaluation of Kubernetes as a possible orchestrator of the ATLAS TDAQ EF computing farm. Kubernetes is a system for advanced management of containerized applications in large clusters. This paper will first highlight some of the technical solutions adopted to run the offline version of today’s EF software in a Docker container. Then it will focus on some scaling performance measurements executed with a cluster of 1000 CPU cores. In particular, this paper will report about the way Kubernetes scales in deploying containers as a function of the cluster size and show how a proper tuning of the Query per Second (QPS) Kubernetes parameter set can improve the scaling of applications in terms of running replicas. Finally, an assessment will be given about the possibility to use Kubernetes as an orchestrator of the EF computing farm in LHC’s Run 4.

Highlights

During Run 2, the Large Hadron Collider (LHC) [1] operated at a centre-of-mass energy of 13 TeV, with a peak luminosity of about 2.0 x 1034 cm-2 s-1 and more than 60 interactions perFrom ATL-DAQ-PROC-2018-022
In order to minimize the impact of the started applications on the measurement, a pause container was used and its image was pre-pulled into the cluster
Assuming no higher order effects with larger clusters (Kubernetes officially supports 5000 hosts clusters), an Event Filter (EF) processing units (PUs) service instance can be fully deployed on each node of a 3000 host cluster in about 35 seconds (Figure 4), matching the corresponding performance figures in Run 2 after a proper choice of the Query per Second (QPS) values

Summary

Introduction

During Run 2, the Large Hadron Collider (LHC) [1] operated at a centre-of-mass energy of 13 TeV, with a peak luminosity of about 2.0 x 1034 cm-2 s-1 and more than 60 interactions per. The upgraded TDAQ system will sustain an input rate of 1 MHz (10 times more than in Run 2) with an average event size of about 5 MB (4 times more than in Run 2). It will include a large IT infrastructure, with thousands of computing nodes and applications to supervise. The following chapters will focus on the evaluation of a possible candidate to orchestrate the EF computing farm operations

Event Filter farm orchestration

Event Filter processing units in software containers

Performance and scaling

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2019
Citations: 2	License type: CC BY 4.0

R Discovery Prime

Evaluating Kubernetes as an orchestrator of the Event Filter computing farm of the Trigger and Data Acquisition system of the ATLAS experiment at the Large Hadron Collider

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

Evolution of the Trigger and Data Acquisition System for the ATLAS experiment
A Negri
Journal of Physics: Conference Series | VOL. 396
A NegriA Negri
13 Dec 2012
Journal of Physics: Conference Series | VOL. 396

The integrated graphical user interface of the trigger and data acquisition system of the ATLAS experiment at the LHC
G Avolio ... G Lehmann Miotto
Journal of Physics: Conference Series | VOL. 331
G Avolio, et. al.G Avolio ... G Lehmann Miotto
23 Dec 2011
Journal of Physics: Conference Series | VOL. 331

Development and tests of the event filter for the ATLAS experiment
M Bosman ... A Negri
-
M Bosman, et. al.M Bosman ... A Negri
01 Jan 2004
01 Jan 2004

A web-based solution to visualize operational monitoring data in the Trigger and Data Acquisition system of the ATLAS experiment at the LHC
G Avolio ... I Soloviev
Journal of Physics: Conference Series | VOL. 898
G Avolio, et. al.G Avolio ... I Soloviev
01 Oct 2017
Journal of Physics: Conference Series | VOL. 898

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Evaluating Kubernetes as an orchestrator of the Event Filter computing farm of the Trigger and Data Acquisition system of the ATLAS experiment at the Large Hadron Collider

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: EPJ Web of Conferences