Abstract

The CernVM File System (CernVM-FS) provides a scalable and reliable software distribution and—to some extent—a data distribution service. It gives POSIX access to more than a billion binary files of experiment application software stacks and operating system containers to end user devices, grids, clouds, and supercomputers. Increasingly, CernVM-FSalso provides access to certain classes of data, such as detector conditions data, genomics reference sets, or gravitational wave detector experiment data. For most of the high- energy physics experiments, an underlying HTTP content distribution infrastructure is jointly provided by universities and research institutes around the world. In this contribution, we will present recent developments and future plans. For future developments, we put a focus on evolving the content distribution infrastructure and at lowering the barrier for publishing into CernVM-FS. Through so-called serverless computing, we envision cloud hosted CernVM-FS repositories without the need to operate dedicated servers or virtual machines. An S3 compatible service in conjunction with a content delivery network takes on data provisioning, replication, and caching. A chainof time-limited and resource-limited functions (so called “lambda function” or “function-as- a-service”) operate on the repository and stage the updates. As a result, any CernVM-FS client should be able to turn intoawriter, possession of suitable keys provided. For repository owners, we aim at providing cost transparency and seamless scalability from very small to very large CernVM-FS installations.

Highlights

  • The CernVM File System (CernVM-FS), a global read-only POSIX file system, provides the software distribution backbone for the experiments at the LHC and numerous other scientific collaborations within and beyond High Energy and Nuclear Physics (HENP). [1,2,3] The CernVM-FS client provides the virtual /cvmfs directory tree on worker nodes in the grid as well as on containers and virtual machines in the cloud, on compute nodes of supercomputers, and on end-user devices

  • Repositories are independent file system instances, and similar to Internet web servers there is no central directory of available CernVM-FS repositories

  • New and modified files are transformed into a content-addressed format and a new, consistent directory tree is fixed

Read more

Summary

Introduction

The CernVM File System (CernVM-FS), a global read-only POSIX file system, provides the software distribution backbone for the experiments at the LHC and numerous other scientific collaborations within and beyond High Energy and Nuclear Physics (HENP). [1,2,3] The CernVM-FS client provides the virtual /cvmfs directory tree on worker nodes in the grid as well as on containers and virtual machines in the cloud, on compute nodes of supercomputers, and on end-user devices. While costly at the time of writing, the contentaddressed format provides de-duplication, versioning, an always consistent view of the repositories, and it decouples the file system semantics from the content distribution. The data transport and storage layers operate on immutable and mutually independent blobs In this contribution, we outline improvements to the CernVM-FS publishing process. Based on support for S3 storage [5] and distributed writing where multiple nodes can publish concurrently into separate directory sub trees [6], we describe how to evolve the CernVM-FS server side for “serverless” operations. We envision a scenario in which any CernVM-FS client with suitable keys can acquire a short-term lease for a directory sub tree and publish new content directly into cloud storage. For the simple case where the file system modifications can be described as a tarball of new files, we envision S3 endpoints that allow direct publishing for such payloads

Strategic Use Cases
Serverless Computing
CernVM-FS Serverless Publishing
Portals
Outlook
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call