Evolution of the ATLAS distributed computing system during the LHC long shutdown

S Campana

doi:10.1088/1742-6596/513/3/032016

Abstract

The ATLAS Distributed Computing project (ADC) was established in 2007 to develop and operate a framework, following the ATLAS computing model, to enable data storage, processing and bookkeeping on top of the Worldwide LHC Computing Grid (WLCG) distributed infrastructure. ADC development has always been driven by operations and this contributed to its success. The system has fulfilled the demanding requirements of ATLAS, daily consolidating worldwide up to 1 PB of data and running more than 1.5 million payloads distributed globally, supporting almost one thousand concurrent distributed analysis users. Comprehensive automation and monitoring minimized the operational manpower required. The flexibility of the system to adjust to operational needs has been important to the success of the ATLAS physics program. The LHC shutdown in 2013-2015 affords an opportunity to improve the system in light of operational experience and scale it to cope with the demanding requirements of 2015 and beyond, most notably a much higher trigger rate and event pileup. We will describe the evolution of the ADC software foreseen during this period. This includes consolidating the existing Production and Distributed Analysis framework (PanDA) and ATLAS Grid Information System (AGIS), together with the development and commissioning of next generation systems for distributed data management (DDM/Rucio) and production (Prodsys-2). We will explain how new technologies such as Cloud Computing and NoSQL databases, which ATLAS investigated as R&D projects in past years, will be integrated in production. Finally, we will describe more fundamental developments such as breaking job-to-data locality by exploiting storage federations and caches, and event level (rather than file or dataset level) workload engines.

Highlights

The ATLAS Distributed Computing project was established in 2007 with the goal to deliver an infrastructure for the needs of the ATLAS experiments in terms of data handling and data processing
The two activities show a very large overlap, since ATLAS Distributed Computing project (ADC) from the beginning pragmatically decided to adopt an operations-driven approach. This has been recognized as a key component of the success of ATLAS computing during the LHC Run-1 and will be preserved in the future
The ADC software stack builds on top of WLCG baseline services, implementing the specific aspects of the ATLAS computing model

Summary

Introduction

The ATLAS Distributed Computing project was established in 2007 with the goal to deliver an infrastructure for the needs of the ATLAS experiments in terms of data handling and data processing. DEfT and JEDI will offer a new platform for distributed analysis, LHC Run-2 will present new challenges for distributed computing: the ATLAS collaboration foresees to collect detector data at much higher trigger rate (1KHz, to be compared to the average rate of 450Hz during Run-1) and higher luminosity (which will yield a higher event pile-up) and this implies the need to produce a higher amount of simulated data as well This will change the scale at which the ATLAS distributed computing system will have to operate: more data will have to be managed, moved across the network and accessed through the storage interfaces; more jobs will have to be executed in the distributed environment for both data simulation and data analysis; the footprints of the payloads will become more demanding (higher memory usage, need for longer jobs). It builds on top of WLCG common baseline services such as the File Transfer Service and the LCG File Catalogue and complements their functionalities by implementing the ATLAS specific concepts and policies

The next generation of ATLAS Distributed Data Management

Federating ATLAS storage system using Xrootd

Findings

The ATLAS Event Service

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Jun 11, 2014
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Evolution of the ATLAS distributed computing system during the LHC long shutdown

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

The contribution of the WLCG Tier-2 site in Prague to the global WLCG operations
Dagmar Adamová ... Petr Vokáč
-
Dagmar Adamová, et. al.Dagmar Adamová ... Petr Vokáč
22 Oct 2021
22 Oct 2021

Data intensive ATLAS workﬂows in the Cloud
Gerhard Ferdinand Rzehorz
-
Gerhard Ferdinand RzehorzGerhard Ferdinand Rzehorz
21 Feb 2022
21 Feb 2022

An adaptive spark-based framework for querying large-scale NoSQL and relational databases.
Eman Khashan ... Ali Eldesouky
PloS one | VOL. 16
Eman Khashan, et. al.Eman Khashan ... Ali Eldesouky
19 Aug 2021
PloS one | VOL. 16

DIRAC Site Director: Improving Pilot-Job provisioning on grid resources
Alexandre F Boyer ... David R.C Hill
Future generations computer systems : FGCS | VOL. 133
Alexandre F Boyer, et. al.Alexandre F Boyer ... David R.C Hill
10 Mar 2022
Future generations computer systems : FGCS | VOL. 133

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evolution of the ATLAS distributed computing system during the LHC long shutdown

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series