Abstract

Profiling the archival storage system in scientific computing environments has received much less attention compared to the parallel file system, but is equally important since it stores the final data products safely, for a long duration. In this paper, we analyze eight years worth of data transfer logs for accessing the archival file system (HPSS) in the Oak Ridge Leadership Computing Facility (OLCF), which has been hosting the world's largest supercomputers and file systems. Our analysis encompasses about 135 million data transfer activities to the 80 PB High Performance Storage System (HPSS), between 2010 and 2017. We analyze the logs from several dimensions, including studying the workload characteristics (e.g., access patterns, frequency of accesses and temporal behavior), file system characteristics (e.g., directory depth, file system scaling trends, file types), and scientific user behavior (e.g., domain-specific usage and organization). Based on the analysis, we derive insights into the future evolution of the archive in terms of provisioning, desired features and functionality from the archive software, role and right sizing of the archive tiers, quota management, and the importance of smart and efficient metadata and storage management. We believe our study will prove useful for both operating current archival storage and better provisioning future systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call