Abstract

High-energy physics computing is a typical data-intensive calculation. Each year, petabytes of data needs to be analyzed, and data access performance is increasingly demanding. The tiered storage system scheme for building a unified namespace has been widely adopted. Generally, data is stored on storage devices with different performances and different prices according to different access frequency. When the heat of the data changes, the data is then migrated to the appropriate storage tier. At present, heuristic algorithms based on artificial experience are widely used in data heat prediction. Due to the differences in computing models of different users, the accuracy of prediction is low. A method for predicting future access popularity based on file access characteristics with the help of LSTM deep learning algorithm is proposed as the basis for data migration in hierarchical storage. This paper uses the real data of high-energy physics experiment LHAASO as an example for comparative testing. The results show that under the same test conditions, the model has higher prediction accuracy and stronger applicability than existing prediction models.

Highlights

  • Large-scale scientific experiments such as particle physics, particle astrophysics, and radiation sources are inseparable from large-scale data processing and analysis

  • The I/O access performance of the storage system is important for computing efficiency

  • Most neural networks belong to the Feed Forward Nerual Network (FNN) ; no matter how many hidden layers the network has, the neurons in each layer only accept the input of the connected neurons in the previous layer, and the output produced is only passed to the connected neuron

Read more

Summary

Introduction

Large-scale scientific experiments such as particle physics, particle astrophysics, and radiation sources are inseparable from large-scale data processing and analysis. In the traditional hierarchical storage management process of high energy physics, data file migration sometimes requires the administrator to specify and manually confirm the migrating file list in IHEP. It is heavily dependent on experience, requires a lot of labor costs, and the overall storage system efficiency is not high. Cold file selection methods include LRU, FIFO, file-aging. These methods are essentially based on access history statistics of the storage system and historical data access frequency is one of the important indicators. For data access heat prediction of tiered storage, no similar case has been found

Related work
Model Output
Model Training
Conclusion and Outlook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.