For over a decade, dCache.ORG has provided robust software, called dCache, that is used at more than 80 universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments and many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from all-in-one Raspberry-Pi up to hundreds of nodes in multi-petabyte infrastructures. The life cycle of scientific data is well defined - collected, processed, archived and finally deleted, when it’s not needed anymore. Moreover, during all those stages the data is never modified: either the original data is used, or new derived data is produced. With this knowledge, dCache was designed to handle immutable files as efficiently as possible. Data replication, HSM connectivity and data-server independent operations are only possible due to the immutable nature of stored data. Nowadays many commercial vendors provide such write-once-read-many or WORM storage systems, as they become more and more demanded with grown demand of audio, photo and video content in the web. On the other hand by providing standard NFSv4.1 interface dCache is often used as a general-purpose file-system, especially by new communities, like photon scientists or microbiologists. Although many users are aware of data immutability, some applications and use cases still require in-place updates of stored files. To satisfy new requirements some fundamental changes have to be applied to dCache’s core design. However, new developments must not compromise any aspect of existing functionality. In this presentation we will show new developments in dCache to turn it into a regular file system. We will discuss the challenges to build a distributed storage system, ‘life’ with POSIX compliance, handling of multiple replicas and backward compatibility by providing WORM and noWORM capabilities within the same storage system.
Read full abstract