Abstract
Given the anticipated increase in the amount of scientific data, it is widely accepted that primarily disk based storage will become prohibitively expensive. Tape based storage, on the other hand, provides a viable and affordable solution for the ever increasing demand for storage space. Coupled with a disk caching layer that temporarily holds a small fraction of the total data volume to allow for low latency access, it turns tape based systems into active archival storage (write once, read many) that imposes additional demands on data flow optimization compared to traditional backup setups (write once, read never). In order to preserve the lifetime of tapes and minimize the inherently higher access latency, different tape usage strategies are being evaluated. As an important disk storage system for scientific data that transparently handles tape access, dCache is making efforts to evaluate its recall optimization potential and is introducing a proof-of-concept, high-level stage request scheduling component within its SRM implementation.
Highlights
In recent years the volume of scientific data that needs to be stored and processed is increasing towards the exascale level [1]
Coupled with a disk caching layer that temporarily holds a small fraction of the total data volume to allow for low latency access, it turns tape based systems into active archival storage that imposes additional demands on data flow optimization compared to traditional backup setups
Because tape storage was originally intended for long-term archival and backup purposes with rare largescale recalls in case of emergencies or rare scheduled events, no optimization for recalling data from tape is currently implemented, relying instead on the capabilities of connected tape
Summary
In recent years the volume of scientific data that needs to be stored and processed is increasing towards the exascale level [1]. Hierarchical storage systems, which combine different storage technologies of varying characteristics, such as low latency hard disk drives as well as magnetic tape, and move data between locations based on specified access requirements, provide an economical solution to meet capacity and throughput requirements necessary to store and process this huge amount of data. The dCache [2] software is an open-source distributed storage system that is widely used in the scientific community, in the area of high energy physics. It is primarily a disk based file storage system that is dynamically scalable to hundreds of petabytes and capable of transparently migrating data to and from a connected tape storage system. Because tape storage was originally intended for long-term archival and backup purposes with rare largescale recalls in case of emergencies or rare scheduled events, no optimization for recalling data from tape is currently implemented, relying instead on the capabilities of connected tape
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have