The ATLAS EventIndex and its evolution towards Run 3

M Villaplana Perez,A Kazymov,A Fernandez Casani,P T Vasileva,S Gonzalez De La Hoz,E Gallas,D Barberis,J Hrivnac,M Mineev,E Alexandrov,Z Baranowski,G Rybkin,J Sanchez,C Garcia Montoro,I Alexander,F Prokoshin,I Aleksandrov,J Salt,G Dimitrov

doi:10.1088/1742-6596/1525/1/012056

Abstract

The ATLAS experiment has produced hundreds of petabytes of data and expects to have one order of magnitude more in the future. This data are spread among hundreds of computing Grid sites around the world. The EventIndex is the complete catalogue of all ATLAS events, real and simulated, keeping the references to all permanent files that contain a given event in any processing stage. It provides the means to select and access event data in the ATLAS distributed storage system, and provides support for completeness and consistency checks and trigger and offline selection overlap studies. The EventIndex employs various data handling technologies like Hadoop and Oracle databases, and it is integrated with other parts of the ATLAS distributed computing infrastructure, including systems for data, metadata, and production management. The project has been in operation since the start of LHC Run 2 in 2015, and it is in permanent development in order to satisfy the production and analysis demands and follow technology evolution. The main data store in Hadoop, based on MapFiles and HBase, has worked well during Run 2 but new solutions are being explored for the future. This paper reports on the current system performance and on the studies of a new data storage prototype that can carry the EventIndex through Run 3.