Analysis of data integrity and storage quality of a distributed storage system

Adrian Eduard Negru,Sergiu Weisz,Latchezar Betev,Costin Grigoraș,Nicolae Țăpuş,Mihai Carabaș,C Biscarat,S Campana,C.I Rovelli,S Roiser,B Hegner,G.A Stewart

doi:10.1051/epjconf/202125102035

Adrian Eduard Negru, Sergiu Weisz + Show 10 more

Open Access

https://doi.org/10.1051/epjconf/202125102035

Copy DOI

Abstract

CERN uses the world’s largest scientific computing grid, WLCG, for distributed data storage and processing. Monitoring of the CPU and storage resources is an important and essential element to detect operational issues in its systems, for example in the storage elements, and to ensure their proper and efficient function. The processing of experiment data depends strongly on the data access quality, as well as its integrity and both of these key parameters must be assured for the data lifetime. Given the substantial amount of data, O(200 PB), already collected by ALICE and kept at various storage elements around the globe, scanning every single data chunk would be a very expensive process, both in terms of computing resources usage and in terms of execution time. In this paper, we describe a distributed file crawler that addresses these natural limits by periodically extracting and analyzing statistically significant samples of files from storage elements, evaluates the results and is integrated with the existing monitoring solution, MonALISA.

Highlights

ALICE [1] stands for “A Large Ion Collider Experiment” and it is one of the 4 large experiments at the Large Hadron Collider (LHC) in the European Organization for Nuclear Research (CERN)
To meet the processing and storage requirements, which amount to approximately 150k CPU cores and 200 PB of storage, ALICE uses the WLCG [2] distributed Grid
In order to detect corrupted files and analyse the health and performance of storage elements (SEs), we have developed a file crawler, which periodically submits Grid jobs targeted at the computing element(s) closest to the analyzed SE

Summary

Introduction

ALICE [1] stands for “A Large Ion Collider Experiment” and it is one of the 4 large experiments at the Large Hadron Collider (LHC) in the European Organization for Nuclear Research (CERN). We describe a distributed file crawler that accesses data on a time-cyclic schedule with a quasi-random pattern. It gathers statistics like the number of files that are corrupted or inaccessible as well as the throughput and download latency of individual storage elements. A file is considered corrupted in two basic cases: MD5 sum or apparent size difference of the read file with the one stored in the ALICE Grid catalogue. Since a single corrupted data file in an analysis workflow can cause the loss of results from many other files processed by the same job, discarding the affected file improves the overall operating efficiency. It is important to the experiment because it ensures a continual high availability of the data sets

Related elements of the ALICE Grid software

Architecture of the file crawler system

Crawler timestamps and execution steps

Cleanup

Crawling prepare

Crawling process

Merging

Database update

Implementation details

Sample size calculation

Data gathered by the crawler

Status codes overview

Status codes analysis

Throughput analysis

PFN sample analysis

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Analysis of data integrity and storage quality of a distributed storage system

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

Geotecnologias aplicadas ao Zoneamento Agroecológico do estado do Mato Grosso do Sul
Silvio Barge Bhering ... Waldir Carvalho Junior
Sociedade & Natureza | VOL. 26
Silvio Barge Bhering, et. al.Silvio Barge Bhering ... Waldir Carvalho Junior
01 Jan 2014
Sociedade & Natureza | VOL. 26

Modeling approaches for cross-sectional integrative data analysis: Evaluations and recommendations.
Kenneth Tyler Wilcox ... Lijuan Wang
Psychological methods | VOL. 28
Kenneth Tyler Wilcox, et. al.Kenneth Tyler Wilcox ... Lijuan Wang
01 Feb 2023
Psychological methods | VOL. 28

Sliced inverse regression for integrative multi-omics data analysis.
Yashita Jain ... Shanshan Ding
Statistical applications in genetics and molecular biology | VOL. 18
Yashita Jain, et. al.Yashita Jain ... Shanshan Ding
26 Jan 2019
Statistical applications in genetics and molecular biology | VOL. 18

Managed Data Storage and Data Access Services for Data Grids
...
-
, et. al. ...
01 Dec 2004
01 Dec 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of data integrity and storage quality of a distributed storage system

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences