Abstract

Amazon S3 is a widely adopted web API for scalable cloud storage that could also fulfill storage requirements of the high-energy physics community. CERN has been evaluating this option using some key HEP applications such as ROOT and the CernVM filesystem (CvmFS) with S3 back-ends. In this contribution we present an evaluation of two versions of the Huawei UDS storage system stressed with a large number of clients executing HEP software applications. The performance of concurrently storing individual objects is presented alongside with more complex data access patterns as produced by the ROOT data analysis framework. Both Huawei UDS generations show a successful scalability by supporting multiple byte-range requests in contrast with Amazon S3 or Ceph which do not support these commonly used HEP operations. We further report the S3 integration with recent CvmFS versions and summarize the experience with CvmFS/S3 for publishing daily releases of the full LHCb experiment software stack.

Highlights

  • The storage and management of the Large Hadron Collider (LHC) data is one of the most crucial and demanding activities in the LHC computing infrastructure at CERN and at the many collaborating sites within the Worldwide LHC Computing Grid (WLCG) [6]

  • It is demonstrated that only Universal Distributed Storage (UDS) generations support multi-range get requests operations, which are commonly used in High-Energy Physics (HEP) analysis, when other storage systems such as Amazon S3 and Ceph do not support them

  • S3 Integration and test deployment in CernVM filesystem (CvmFS) we evaluate the ability of cloud storage to work as a back-end for a CvmFS (CernVM File System) while it is storing real LCHb experiment software

Read more

Summary

Introduction

The storage and management of the Large Hadron Collider (LHC) data is one of the most crucial and demanding activities in the LHC computing infrastructure at CERN and at the many collaborating sites within the Worldwide LHC Computing Grid (WLCG) [6]. This benchmark allows the use of heterogeneous clients, i.e. physical and/or virtual machines with different features, because it measures the aggregated throughput until a given deadline It offers the configuration of different ROOT parameters such as the number of entries to read, the selection of branches, and the access protocol to the cloud storage such as xroot [1] or S3, by using the davix plugin [4] for the last one. The aim of these tests is to study the scalability of data reads with different access patterns used in a typical data analysis execution with the ROOT framework Thanks to these tests, it is demonstrated that only UDS generations support multi-range get requests operations, which are commonly used in HEP analysis, when other storage systems such as Amazon S3 and Ceph do not support them

Vector reads by volume size
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.