Abstract

High energy physics (HEP) experiments, such as LHAASO, produce a large amount of data, which is usually stored and processed on distributed sites. Nowadays, the distributed data management system faces some challenges such as global file namespace, efficient data access and storage. Focusing on those problems, this paper proposed a cross-domain data access file system (CDFS), applying data deduplication and compression as the storage-optimized engine, aiming at dynamically building an aggregate view of multiple distributed storages and accessing data in a fast and efficient way. The test based on the raw data of LHAASO experiment showed that the CDFS could present a unique repository based on distributed sites in LHAASO. And the storage-optimized engine reduces the storage consumption of the raw data by more than 50%.

Highlights

  • The Large High Altitude Air Shower Observatory (LHAASO)[1], aiming at exploring the origin of high-energy cosmic rays.The amount of data collected from detectors is huge

  • cross-domain data access file system (CDFS) is a cross-site data access system, which can be deployed on multiple sites, aiming at accessing data in an efficient and fast way, building global directory among different sites and storing more data in the limited storage space

  • CDFS can handle multiple client requests at the same time, merge the same requests, work with CacheD to ensure on-demand access and transfer of data, and with DCFile to ensure that only blocks that do not exist at the target site are transferred

Read more

Summary

Introduction

The Large High Altitude Air Shower Observatory (LHAASO)[1], aiming at exploring the origin of high-energy cosmic rays. The largest amount of data among them is generated by WCDA, 12Gb data per second, 50PB in a year. The raw data is subsequently compressed by another program in off-line mode, which needs a lot of storage and time. The measures taken so far is removing old raw data weekly (monthly) to ensure the preservation of new raw data. A more efficient data storage strategy would allow more data to be stored. Nowadays LHAASO has three sites: Beijing, Chengdu and Daocheng, connected by private network, shown in fig 1. The data is scattered among different sites. The data should be accessed conveniently without knowing which site it is located in and transferred between sites in a fast and efficient way

Related work
Deduplication and compression
Design conception
Global namespace in CDFS
Data compression
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call