CernVM-FS powered container hub

Enrico Bocchi,Andrea Valenzuela,Jakob Blomer,Simone Mosciatti,C Biscarat,S Campana,B Hegner,S Roiser,C.I Rovelli,G.A Stewart

doi:10.1051/epjconf/202125102033

Enrico Bocchi, Andrea Valenzuela + Show 8 more

Open Access

https://doi.org/10.1051/epjconf/202125102033

Copy DOI

Abstract

Containers became the de-facto standard to package and distribute modern applications and their dependencies. The HEP community demonstrates an increasing interest in such technology, with scientists encapsulating their analysis workflow and code inside a container image. The analysis is first validated on a small dataset and minimal hardware resources to then run at scale on the massive computing capacity provided by the grid. The typical approach for distributing containers consists of pulling their image from a remote registry and extracting it on the node where the container runtime (e.g., Docker, Singularity) runs. This approach, however, does not easily scale to large images and thousands of nodes. CVMFS has long been used for the efficient distribution of software directory trees at a global scale. In order to extend its optimized caching and network utilization to the distribution of containers, CVMFS recently implemented a dedicated container image ingestion service together with container runtime integrations. CVMFS ingestion is based on per-file deduplication, instead of the per-layer deduplication adopted by traditional container registries. On the client-side, CVMFS implements on-demand fetching of the chunks required for the execution of the container instead of the whole image.

Highlights

In recent years, container technologies have seen wide adoption by software developers, system administrators, and IT practitioners to the point of becoming the preferred way to package, distribute, and deploy applications
Among the available container runtimes, three are popular in the High Energy Physics (HEP) community: i) Singularity [5] has its roots in the scientific environment and is the most used for containerized jobs on the Worldwide LHC Computing Grid (WLCG); ii) containerd [6] implements the cri-o [7] interface used by Kubernetes [8] and integrates well with container orchestration tools; iii) Podman [9] has the ability to run rootless, is well integrated with the CentOS ecosystem, and provides an interface identical to the one offered by Docker
CernVM File System (CVMFS) is set up to publish to the local SSD disk, while source container images are provided by Docker Hub and by the GitLab Container Registry deployed at CERN

Summary

Introduction

Container technologies have seen wide adoption by software developers, system administrators, and IT practitioners to the point of becoming the preferred way to package, distribute, and deploy applications. Container images built and used in the HEP environment can reach tens of gigabytes in size and, even if pushed only once to the registry, they can potentially be pulled by thousands of computing nodes part of the Worldwide LHC Computing Grid (WLCG) [1] This puts additional load on both the network infrastructure from the container registry to the computing nodes and the storage capacity of each computing node, given that container images must be downloaded and unpacked into the local filesystem. This allows downloading only the files that are strictly needed for the execution of the container (previous studies [2] confirm our own findings that only a small percentage of the total image volume is used), saving network bandwidth and local storage space. Such cache is self-managed and files are automatically purged according to the least recently used policy

Use of Containers in the HEP community

Use of CVMFS in the HEP community

Server capabilities for container images ingestion

Manage image ingestion in CVMFS

Integration with container runtimes

Evaluation

Ingestion of layers

Ingestion of chains

Characterization of image repositories

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CernVM-FS powered container hub

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

Using Kepler workflows in ecology
...
F1000Research | VOL. 3
, et. al. ...
04 Oct 2012
F1000Research | VOL. 3

A Beginner's Guide to Conducting Reproducible Research
Jesse M Alston ... Jessica A Rick
The Bulletin of the Ecological Society of America | VOL. 102
Jesse M Alston, et. al.Jesse M Alston ... Jessica A Rick
15 Jan 2021
The Bulletin of the Ecological Society of America | VOL. 102

Hash semi cascade join for joining multi-way map reduce
Marwa Hussien Mohamed ... Mohamed Helmy Khafagy
-
Marwa Hussien Mohamed, et. al.Marwa Hussien Mohamed ... Mohamed Helmy Khafagy
01 Nov 2015
01 Nov 2015

Application of Data Mining to Small Data Sets: Identification of Key Production Drivers in Heterogeneous Unconventional Resources
Yanrui Ning ... Harrison Schumann
SPE Reservoir Evaluation & Engineering | VOL. 26
Yanrui Ning, et. al.Yanrui Ning ... Harrison Schumann
12 Oct 2022
SPE Reservoir Evaluation & Engineering | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CernVM-FS powered container hub

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences