Abstract

We overview recent changes in the ROOT I/O system, enhancing it by improving its performance and interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly improve experiment’s software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased over the last couple of years. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly, because there are significant trade-offs between the increased CPU cost for reading and writing files and the reduces storage space.

Highlights

  • Large Hadron Collider (LHC) experiments are managing about an exabyte of storage for analysis purposes, approximately half of which is stored on tape storages for archival purposes, and half is used for traditional disk storage

  • We will try to focus on the evaluation of compression of most used analysisrelated formats in CMS, MiniAOD [8] and NanoAOD [9], as well as a simple case of analysis file used by the LHCb experiment

  • NanoAOD format consists of a flat, ROOT Ntuple-like format, readable with bare ROOT and containing the per-event information that is needed in most generic analyses

Read more

Summary

Introduction

Large Hadron Collider (LHC) experiments are managing about an exabyte of storage for analysis purposes, approximately half of which is stored on tape storages for archival purposes, and half is used for traditional disk storage. For High Luminosity Large Hadron Collider (HL-LHC) storage requirements per year are expected to be increased by a factor of 10 [1]. Looking at these predictions, we would like to state that storage will remain one of the major cost drivers and, at the same time, the bottlenecks for HEP computing. The new storage and data management techniques, as well as a compression algorithms, are likely will be more required to remove a storage and analysis computing cost bottleneck. ZSTD is available as a ROOT supported compression algorithm, starting from ROOT v6.20 release [3]

Background
Evaluation of simple ZSTD algorithm for LHC datatsets
Limitations and Future work
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.