Abstract

User data analysis in high energy physics presents a challenge to spinning-disk based storage systems. The analysis is data intense, yet reads are small, sparse and cover a large volume of data files. It is also unpredictable due to users' response to storage performance. We describe here a system with an array of Solid State Disk as a non-conventional, standalone file level cache in front of the spinning disk storage to help improve the performance of LHC ATLAS user analysis at SLAC. The system uses several days of data access records to make caching decisions. It can also use information from other sources such as a work-flow management system. We evaluate the performance of the system both in terms of caching and its impact on user analysis jobs. The system currently uses Xrootd technology, but the technique can be applied to any storage system.

Highlights

  • We expect input data for jobs running on Group B will mostly read from Solid State Disks (SSD), while jobs running on Group A will exclusively read from hard disk drives (HDD)

  • The SSD cache at the ATLAS Tier 2 at SLAC demonstrated that it worked as a cache

  • Due to the fluctuating nature of the Grid based user analysis jobs running at the Tier 2, it is difficult to achieve high cache hit rate at all times

Read more

Summary

File X

Caching algorithm using number of bytes read and analysis job input data information The algorithm is similar to the previous one except the caching algorithm checks against the list of input files of the upcoming user analysis jobs from the PanDA workflow management system: Every hour the algorithm builds a table (Table 3) from the last 5 days’ monitoring data, and fills the data in the blue columns. Every 20 minutes the algorithm uses the upcoming PanDA jobs information to predict the future need from jobs. It inserts the two green columns to the table . It sorts the right-most column and makes caching decision. Number of reads Average bytes # of reads by Total read/file size in 5 days read/file size upcoming jobs by upcoming jobs

File B
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call