Abstract
The past years have shown a revolution in the way scientific workloads are being executed thanks to the wide adoption of software containers. These containers run largely isolated from the host system, ensuring that the development and execution environments are the same everywhere. This enables full reproducibility of the workloads and therefore also the associated scientific analyses performed. However, as the research software used becomes increasingly complex, the software images grow easily to sizes of multiple gigabytes. Downloading the full image onto every single compute node on which the containers are executed becomes unpractical. In this paper, we describe a novel way of distributing software images on the Kubernetes platform, with which the container can start before the entire image contents become available locally (so-called “lazy pulling”). Each file required for the execution is fetched individually and subsequently cached on-demand using the CernVM file system (CVMFS), enabling the execution of very large software images on potentially thousands of Kubernetes nodes with very little overhead. We present several performance benchmarks making use of typical high-energy physics analysis workloads.
Highlights
A software container is a packaged unit of software that contains all dependencies required to run the software independently of the environment in which the container is executed
The goal of our study is to evaluate the feasibility of executing arbitrary containerized workloads using a novel lazy container pulling approach that makes use of a caching system that is commonly used for software distribution in high-energy physics (HEP) and related areas
We found that the reason runs failed was that we hit the Docker Hub download rate limit when obtaining the image manifest, which means that in the future other/dedicated container registries should be used instead
Summary
A software container is a packaged unit of software that contains all dependencies required to run the software independently of the environment in which the container is executed. Software containers are widely used in industry and in scientific high-throughput computing. We in particular focus on high-energy physics (HEP) applications. The physics experiments at the Large Hadron Collider (LHC) (Evans and Bryant, 2008) particle accelerator at CERN have been running for more than a decade. Scientific Linux CERN 5 (Scientific Linux CERN, 2021), the OS used at the beginning of data-taking campaigns in 2009, has reached its end-of-life years ago and does not receive security updates anymore. Even the OS used for last data-taking campaign in 2015–2018, Scientific Linux CERN 6, is not maintained since December 2020. No large-scale installations running these systems exist anymore, and software
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have