Abstract

The CERN ATLAS Experiment successfully uses a worldwide distributed computing Grid infrastructure to support its physics programme at the Large Hadron Collider (LHC). The Grid workflow system PanDA routinely manages up to 700,000 concurrently running production and analysis jobs to process simulation and detector data. In total more than 500 PB of data are distributed over more than 150 sites in the WLCG and handled by the ATLAS data management system Rucio. To prepare for the ever growing data rate in future LHC runs new developments are underway to embrace industry accepted protocols and technologies, and utilize opportunistic resources in a standard way. This paper reviews how the Google and Amazon Cloud computing services have been seamlessly integrated as a Grid site within PanDA and Rucio. Performance and brief cost evaluations will be discussed. Such setups could offer advanced Cloud tool-sets and provide added value for analysis facilities that are under discussions for LHC Run-4.

Highlights

  • The distributed computing system [1] of the ATLAS experiment [2] at the Large Hadron Collider (LHC) is built around two main components: the workflow management system PanDA [3] and the data management system Rucio [4]

  • The resources used are the Tier-0 at CERN and Tier-1/2/3 Grid sites worldwide, opportunistic resources at High Performance Computing (HPC) sites, Cloud computing providers, and volunteer computing resources

  • These components will be used in the following to setup a transparent integration into PanDA and Rucio for production and user analysis workflows. They can be the basis for an analysis facility in the Cloud and be one ingredient to address the LHC Run-4 data processing challenges

Read more

Summary

Introduction

The distributed computing system [1] of the ATLAS experiment [2] at the LHC is built around two main components: the workflow management system PanDA [3] and the data management system Rucio [4]. These components will be used in the following to setup a transparent integration into PanDA and Rucio for production and user analysis workflows. They can be the basis for an analysis facility in the Cloud and be one ingredient to address the LHC Run-4 data processing challenges

Data management in the Cloud with Rucio
Simulation of Cloud data management
Integration of Kubernetes with the ATLAS workload management system
Harvester-Kubernetes plugins
Frontier squid
Service accounts
Infrastructure choices in the Cloud
Kubernetes mini-grid
Production and Analysis jobs performance
Findings
Summary and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call