Abstract

Archival data storage plays a critical role in data preservation as almost all current data will eventually be archived. In addition, the demands placed on archival storage tiers are growing because of large regularly-scheduled backups. Archival storage tiers usually consist of tape-based devices with a large storage capacity, but limited I/O performance for retrieving data, especially when multiple retrieval requests are made simultaneously. The cost of disk-based devices continues to decrease while the capacity of individual disks increases so that disk-based systems are a realistic option for enterprise archival storage tiers. Optimization approaches can design archival storage systems with the best mix of small, low-cost machines and larger, expensive machines, but only if various metrics of the candidate machines are well-understood. This paper investigates the measurement of different classes of enterprise servers when utilized by a distributed file system. Our study primarily concerns the possible use of these servers within a disk-based archival storage system and produces measurements suitable for immediate use in the optimization-driven design of archival storage. Observing patterns from these measurements also enables us to predict metrics for other enterprise servers and then incorporate these alternative servers in the design process. We combine our measurements and predictions with an optimization engine to discover an ideal building block for a 500TB archival storage system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call