Abstract

The ATLAS EventIndex was designed in 2012-2013 to provide a global event catalogue and limited event-level metadata for ATLAS analysis groups and users during the LHC Run 2 (2015-2018). It provides a good and reliable service for the initial use cases (mainly event picking) and several additional ones, such as production consistency checks, duplicate event detection and measurements of the overlaps of trigger chains and derivation datasets. The LHC Run 3, starting in 2021, will see increased data-taking and simulation production rates, with which the current infrastructure would still cope but may be stretched to its limits by the end of Run 3. This proceeding describes the implementation of a new core storage service that will be able to provide at least the same functionality as the current one for increased data ingestion and search rates, and with increasing volumes of stored data. It is based on a set of HBase tables, with schemas derived from the current Oracle implementation, coupled to Apache Phoenix for data access; in this way we will add to the advantages of a BigData based storage system the possibility of SQL as well as NoSQL data access, allowing to re-use most of the existing code for metadata integration.

Highlights

  • The ATLAS experiment [1] at the LHC accelerator at CERN collected during the so-called “Run 2” (2015-2018) several billion physics events each year, plus a large amount of test and calibration data

  • References to the events at each processing stage in all permanent files generated by central productions: globally unique identifiers (GUIDs) of the files containing the event at the current processing stage and previous ones if available

  • The GUID retrieved by the query to the EventIndex catalogue is used to find the relevant file in the Grid data store and extract the requested event

Read more

Summary

Introduction

The ATLAS experiment [1] at the LHC accelerator at CERN collected during the so-called “Run 2” (2015-2018) several billion physics events each year, plus a large amount of test and calibration data. The most promising solution was based on the Hadoop eco-system [3], with data stored in HDFS MapFiles and an internal catalogue in HBase. This system was preloaded with all Run 1 real data (2009-2013) and started operation in the Spring of 2015 at the start of Run 2 [4]. It was the first high-energy physics computing system based from the start on open-source structured storage technologies

Use Cases
Data Contents
System Architecture
The EventIndex for LHC Run 3
New Requirements
System Design Evolution
Findings
Conclusions and Outlook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.