Abstract

For over a decade, dCache.ORG has provided robust software, called dCache, that is used at more than 80 universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments and many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from all-in-one Raspberry-Pi up to hundreds of nodes in multi-petabyte infrastructures. The life cycle of scientific data is well defined - collected, processed, archived and finally deleted, when it’s not needed anymore. Moreover, during all those stages the data is never modified: either the original data is used, or new derived data is produced. With this knowledge, dCache was designed to handle immutable files as efficiently as possible. Data replication, HSM connectivity and data-server independent operations are only possible due to the immutable nature of stored data. Nowadays many commercial vendors provide such write-once-read-many or WORM storage systems, as they become more and more demanded with grown demand of audio, photo and video content in the web. On the other hand by providing standard NFSv4.1 interface dCache is often used as a general-purpose file-system, especially by new communities, like photon scientists or microbiologists. Although many users are aware of data immutability, some applications and use cases still require in-place updates of stored files. To satisfy new requirements some fundamental changes have to be applied to dCache’s core design. However, new developments must not compromise any aspect of existing functionality. In this presentation we will show new developments in dCache to turn it into a regular file system. We will discuss the challenges to build a distributed storage system, ‘life’ with POSIX compliance, handling of multiple replicas and backward compatibility by providing WORM and noWORM capabilities within the same storage system.

Highlights

  • Any data is associated with a life time and a life cycle and scientific data is not an exception

  • WORM storage sounds like a requirement for scientific data, in reality it’s not the case

  • Legal organizations require that storage systems guarantee that documents are never modified

Read more

Summary

INTRODUCTION

Any data is associated with a life time and a life cycle and scientific data is not an exception. The raw data is collected at experiment’s detectors, filtered, indexed converted into different representation formats and published or get referenced in scientific papers During all these stages the data itself is never modified - it is either used. As beam time can spread over multiple days, those container files can be updated to add new images into a set. To support data collected by such experiments the underlying storage system must support updating of existing files at a random offset. When such a container is frozen, when beam time is over, the experiment workflow assumes that the storage system will not allow any modification to the collected data set

WORM in dCache
Data replication and mirroring
Compatibility with non NFS clients
Current status
Summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call