Abstract

The ATLAS Distributed Computing project (ADC) was established in 2007 to develop and operate a framework, following the ATLAS computing model, to enable data storage, processing and bookkeeping on top of the Worldwide LHC Computing Grid (WLCG) distributed infrastructure. ADC development has always been driven by operations and this contributed to its success. The system has fulfilled the demanding requirements of ATLAS, daily consolidating worldwide up to 1 PB of data and running more than 1.5 million payloads distributed globally, supporting almost one thousand concurrent distributed analysis users. Comprehensive automation and monitoring minimized the operational manpower required. The flexibility of the system to adjust to operational needs has been important to the success of the ATLAS physics program. The LHC shutdown in 2013-2015 affords an opportunity to improve the system in light of operational experience and scale it to cope with the demanding requirements of 2015 and beyond, most notably a much higher trigger rate and event pileup. We will describe the evolution of the ADC software foreseen during this period. This includes consolidating the existing Production and Distributed Analysis framework (PanDA) and ATLAS Grid Information System (AGIS), together with the development and commissioning of next generation systems for distributed data management (DDM/Rucio) and production (Prodsys-2). We will explain how new technologies such as Cloud Computing and NoSQL databases, which ATLAS investigated as R&D projects in past years, will be integrated in production. Finally, we will describe more fundamental developments such as breaking job-to-data locality by exploiting storage federations and caches, and event level (rather than file or dataset level) workload engines.

Highlights

  • The ATLAS Distributed Computing project was established in 2007 with the goal to deliver an infrastructure for the needs of the ATLAS experiments in terms of data handling and data processing

  • The two activities show a very large overlap, since ATLAS Distributed Computing project (ADC) from the beginning pragmatically decided to adopt an operations-driven approach. This has been recognized as a key component of the success of ATLAS computing during the LHC Run-1 and will be preserved in the future

  • The ADC software stack builds on top of WLCG baseline services, implementing the specific aspects of the ATLAS computing model

Read more

Summary

Introduction

The ATLAS Distributed Computing project was established in 2007 with the goal to deliver an infrastructure for the needs of the ATLAS experiments in terms of data handling and data processing. DEfT and JEDI will offer a new platform for distributed analysis, LHC Run-2 will present new challenges for distributed computing: the ATLAS collaboration foresees to collect detector data at much higher trigger rate (1KHz, to be compared to the average rate of 450Hz during Run-1) and higher luminosity (which will yield a higher event pile-up) and this implies the need to produce a higher amount of simulated data as well This will change the scale at which the ATLAS distributed computing system will have to operate: more data will have to be managed, moved across the network and accessed through the storage interfaces; more jobs will have to be executed in the distributed environment for both data simulation and data analysis; the footprints of the payloads will become more demanding (higher memory usage, need for longer jobs). It builds on top of WLCG common baseline services such as the File Transfer Service and the LCG File Catalogue and complements their functionalities by implementing the ATLAS specific concepts and policies

The next generation of ATLAS Distributed Data Management
Federating ATLAS storage system using Xrootd
Findings
The ATLAS Event Service
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call