The ATLAS Data Management System Rucio: Supporting LHC Run-2 and beyond

M Barisits,T Beermann,M Lassnig,T Javurek,C Serfon,V Garonne,Atlas Collaboration

doi:10.1088/1742-6596/1085/3/032030

Abstract

With this contribution we present some recent developments made to Rucio, the data management system of the High-Energy Physics Experiment ATLAS. Already managing 300 Petabytes of both official and user data, Rucio has seen incremental improvements throughout LHC Run-2, and is currently laying the groundwork for HEP computing in the HL-LHC era. The focus of this contribution are (a) the automations that have been put in place such as data rebalancing or dynamic replication of user data, as well as their supporting infrastructures such as real-time networking metrics or transfer time predictions; (b) the flexible approach towards inclusion of heterogeneous storage systems, including object stores, while unifying the potential access paths using generally available tools and protocols; (c) machine learning approaches to help with transfer throughput estimation; and (d) the adoption of Rucio for two other experiments, AMS and Xenon1t. We conclude by presenting operational numbers and figures to quantify these improvements, and extrapolate the necessary changes and developments for future LHC runs.

Highlights

Rucio [1] is the Distributed Data Management (DDM) system in charge of managing all ATLAS [2] data on the grid
Over the last year multiple improvements have been introduced to support the collaboration’s data management needs for LHC Run-2 and beyond. These changes focused on the integration of new technologies, the automation of the system for optimization and reduction of manual work, and machine learning studies to better understand the usage of the system
The idea of the automatic background rebalancing is to prevent emergency situations like this from even occurring. To this effect the automatic background rebalancing daemon trys to balance the amount of primary to secondary replicas at a set of storages, such as all Tier-1 storage elements

Summary

Introduction

Rucio [1] is the Distributed Data Management (DDM) system in charge of managing all ATLAS [2] data on the grid. The main purpose of the system is to help the collaboration to store, manage and process LHC data in a heterogeneous distributed environment. Over the last year multiple improvements have been introduced to support the collaboration’s data management needs for LHC Run-2 and beyond. These changes focused on the integration of new technologies, the automation of the system for optimization and reduction of manual work, and machine learning studies to better understand the usage of the system.

Published under licence by IOP Publishing Ltd

Conclusion