Abstract

Given the anticipated increase in the amount of scientific data, it is widely accepted that primarily disk based storage will become prohibitively expensive. Tape based storage, on the other hand, provides a viable and affordable solution for the ever increasing demand for storage space. Coupled with a disk caching layer that temporarily holds a small fraction of the total data volume to allow for low latency access, it turns tape based systems into active archival storage (write once, read many) that imposes additional demands on data flow optimization compared to traditional backup setups (write once, read never). In order to preserve the lifetime of tapes and minimize the inherently higher access latency, different tape usage strategies are being evaluated. As an important disk storage system for scientific data that transparently handles tape access, dCache is making efforts to evaluate its recall optimization potential and is introducing a proof-of-concept, high-level stage request scheduling component within its SRM implementation.

Highlights

  • In recent years the volume of scientific data that needs to be stored and processed is increasing towards the exascale level [1]

  • Coupled with a disk caching layer that temporarily holds a small fraction of the total data volume to allow for low latency access, it turns tape based systems into active archival storage that imposes additional demands on data flow optimization compared to traditional backup setups

  • Because tape storage was originally intended for long-term archival and backup purposes with rare largescale recalls in case of emergencies or rare scheduled events, no optimization for recalling data from tape is currently implemented, relying instead on the capabilities of connected tape

Read more

Summary

Introduction

In recent years the volume of scientific data that needs to be stored and processed is increasing towards the exascale level [1]. Hierarchical storage systems, which combine different storage technologies of varying characteristics, such as low latency hard disk drives as well as magnetic tape, and move data between locations based on specified access requirements, provide an economical solution to meet capacity and throughput requirements necessary to store and process this huge amount of data. The dCache [2] software is an open-source distributed storage system that is widely used in the scientific community, in the area of high energy physics. It is primarily a disk based file storage system that is dynamically scalable to hundreds of petabytes and capable of transparently migrating data to and from a connected tape storage system. Because tape storage was originally intended for long-term archival and backup purposes with rare largescale recalls in case of emergencies or rare scheduled events, no optimization for recalling data from tape is currently implemented, relying instead on the capabilities of connected tape

Background
The ATLAS Data Carousel
The dCache System and Its Tape Interaction
Recall Performance Markers
Optimizing Bring-Online Performance
Case Study KIT
Simulation Setup and Scenarios
Results
Clustering Logic
Summary and Outlook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.