Using S3 cloud storage with ROOT and CvmFS

María Arsuaga-Ríos,Dirk Duellmann,René Meusel,Jakob Blomer,Ben Couturier,Seppo S Heikkilä

doi:10.1088/1742-6596/664/2/022001

María Arsuaga-Ríos, Dirk Duellmann + Show 4 more

Open Access

PDF Available

https://doi.org/10.1088/1742-6596/664/2/022001

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Amazon S3 is a widely adopted web API for scalable cloud storage that could also fulfill storage requirements of the high-energy physics community. CERN has been evaluating this option using some key HEP applications such as ROOT and the CernVM filesystem (CvmFS) with S3 back-ends. In this contribution we present an evaluation of two versions of the Huawei UDS storage system stressed with a large number of clients executing HEP software applications. The performance of concurrently storing individual objects is presented alongside with more complex data access patterns as produced by the ROOT data analysis framework. Both Huawei UDS generations show a successful scalability by supporting multiple byte-range requests in contrast with Amazon S3 or Ceph which do not support these commonly used HEP operations. We further report the S3 integration with recent CvmFS versions and summarize the experience with CvmFS/S3 for publishing daily releases of the full LHCb experiment software stack.

Highlights

The storage and management of the Large Hadron Collider (LHC) data is one of the most crucial and demanding activities in the LHC computing infrastructure at CERN and at the many collaborating sites within the Worldwide LHC Computing Grid (WLCG) [6]
It is demonstrated that only Universal Distributed Storage (UDS) generations support multi-range get requests operations, which are commonly used in High-Energy Physics (HEP) analysis, when other storage systems such as Amazon S3 and Ceph do not support them
S3 Integration and test deployment in CernVM filesystem (CvmFS) we evaluate the ability of cloud storage to work as a back-end for a CvmFS (CernVM File System) while it is storing real LCHb experiment software

Summary

Introduction

The storage and management of the Large Hadron Collider (LHC) data is one of the most crucial and demanding activities in the LHC computing infrastructure at CERN and at the many collaborating sites within the Worldwide LHC Computing Grid (WLCG) [6]. This benchmark allows the use of heterogeneous clients, i.e. physical and/or virtual machines with different features, because it measures the aggregated throughput until a given deadline It offers the configuration of different ROOT parameters such as the number of entries to read, the selection of branches, and the access protocol to the cloud storage such as xroot [1] or S3, by using the davix plugin [4] for the last one. The aim of these tests is to study the scalability of data reads with different access patterns used in a typical data analysis execution with the ROOT framework Thanks to these tests, it is demonstrated that only UDS generations support multi-range get requests operations, which are commonly used in HEP analysis, when other storage systems such as Amazon S3 and Ceph do not support them

Vector reads by volume size

Findings

Conclusions