A Reliable Large Distributed Object Store Based Platform for Collecting Event Metadata

Álvaro Fernández Casaní,Santiago González De La Hoz,Javier Sánchez,Juan M Orduña

doi:10.1007/s10723-021-09580-0

Álvaro Fernández Casaní, Santiago González De La Hoz + Show 2 more

Open Access

https://doi.org/10.1007/s10723-021-09580-0

Copy DOI

Abstract

The Large Hadron Collider (LHC) is about to enter its third run at unprecedented energies. The experiments at the LHC face computational challenges with enormous data volumes that need to be analysed by thousands of physics users. The ATLAS EventIndex project, currently running in production, builds a complete catalogue of particle collisions, or events, for the ATLAS experiment at the LHC. The distributed nature of the experiment data model is exploited by running jobs at over one hundred Grid data centers worldwide. Millions of files with petabytes of data are indexed, extracting a small quantity of metadata per event, that is conveyed with a data collection system in real time to a central Hadoop instance at CERN. After a successful first implementation based on a messaging system, some issues suggested performance bottlenecks for the challenging higher rates in next runs of the experiment. In this work we characterize the weaknesses of the previous messaging system, regarding complexity, scalability, performance and resource consumption. A new approach based on an object-based storage method was designed and implemented, taking into account the lessons learned and leveraging the ATLAS experience with this kind of systems. We present the experiment that we run during three months in the real production scenario worldwide, in order to evaluate the messaging and object store approaches. The results of the experiment show that the new object-based storage method can efficiently support large-scale data collection for big data environments like the next runs of the ATLAS experiment at the LHC.

Highlights

The Large Hadron Collider (LHC) is a particle accelerator located at CERN near Geneva, at the border of Switzerland and France, with a circumference of 27 km and placed in a tunnel 175 meters below ground
The EventIndex [2, 3] is a metadata catalogue at the event level which tries to exploit technologies such as Hadoop [4] as a backend storage. It has been running in production since the start of LHC Run 2 in 2015, indexing all produced data which is though to be of interest for physics analysis
The results show that the object-based storage (OBS) method seems an appropriate option for large grid systems generating big data like the one required for the run of the ATLAS experiment at CERN

Summary

Introduction

The Large Hadron Collider (LHC) is a particle accelerator located at CERN near Geneva, at the border of Switzerland and France, with a circumference of 27 km and placed in a tunnel 175 meters below ground. Instead of chunking and submitting the payload with a messaging system, we considered the use of an object-based storage (OBS) to temporarily store the payload, and submit a small message with a reference to be used by the consumers to retrieve the data With this approach we can avoid the need for payload segmentation and the partitions (MessageGroups) that cause the bottlenecks. The supervisor is in charge of selecting the valid produced information and signaling consumers to retrieve the appropriate data from the OBS system The communication for these entities is done with control and statistic messages similar to the ones from the messaging scenario, so we still use queues from the brokers to distribute the processing messages among different consumers. Objects can be accessed multiple times if needed, as they have longer lifetimes than messages that disappear when retrieved from the broker

Evaluation

Results indexing a single dataset

Results for all datasets

Conclusions

ATLAS Collaboration