Abstract

The storage group of CERN IT operates more than 20 individual EOS[1] storage services with a raw data storage volume of more than 340 PB. Storage space is a major cost factor in HEP computing and the planned future LHC Run 3 and 4 increase storage space demands by at least an order of magnitude. A cost effective storage model providing durability is Erasure Coding (EC) [2]. The decommissioning of CERN’s remote computer center (Wigner/Budapest) allows a reconsideration of the currently configured dual-replica strategy where EOS provides one replica in each computer center. EOS allows one to configure EC on a per file bases and exposes four different redundancy levels with single, dual, triple and fourfold parity to select different quality of service and variable costs. This paper will highlight tests which have been performed to migrate files on a production instance from dual-replica to various EC profiles. It will discuss performance and operational impact, and highlight various policy scenarios to select the best file layout with respect to IO patterns, file age and file size. We will conclude with the current status and future optimizations, an evaluation of cost savings and discuss an erasure encoded EOS setup as a possible tape storage replacement.

Highlights

  • The storage group of CERN IT operates more than 20 individual EOS [4] storage services reaching a raw data storage volume of 340 PB in 2020

  • File checksums can still be computed in the regular way because the IO of one file flows through a single storage node and computation can be done on the fly

  • The conducted tests corroborate that Erasure Coding (EC) is beneficial and feasible

Read more

Summary

Introduction

The storage group of CERN IT operates more than 20 individual EOS [4] storage services reaching a raw data storage volume of 340 PB in 2020. Until the end of 2019 the CERN EOS deployment was using a distributed storage model. Each file was stored with one replica at the CERN and one replica at the Wigner computer center in Budapest. This model allowed batch jobs in both centers to profit from low latency access to local file replicas. EOS allows to configure EC on a per file bases and exposes four different redundancy levels with single, dual, triple and fourfold parity to select different quality of service and variable costs. In April 2019 first tests have been performed to migrate files on a production instance from dual-replica to various EC profiles to measure performance and operational impact and investigate various policy scenarios to select the best file layout with respect to IO patterns, file age and file size

Implementation
Usability at CERN
Native EC implementation in EOS
Storage Volume Increase
Testing EC conversions
Conversion Policies
Conversion Time Extrapolation
EC Evolution - Direct Object Storage
Summary
Findings
Outlook
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call