Distributing User Code with the CernVM FileSystem

Dave Dykstra,Shreyas Bhat,Dennis Box,Tanya Levshina,Hyun Woo Kim

doi:10.1051/epjconf/202024503015

Dave Dykstra, Shreyas Bhat + Show 3 more

Open Access

PDF Available

https://doi.org/10.1051/epjconf/202024503015

Copy DOI

Export

Save

Cite

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2020
Citations: 1	License type: CC BY 4.0

Affiliation: Fermi National Accelerator Laboratory

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The CernVM FileSystem (CVMFS) is widely used in High Throughput Computing to efficiently distributed experiment code. However, the standard CVMFS publishing tools are designed for a small group of people from each experiment to maintain common software, and the tools are not a good fit for publishing software from numerous users in each experiment. As a result, most user code, such as code to do specific physics analyses, is still sent with every job to the place the job is run. That process is relatively inefficient, especially when the user code is large. To overcome these limitations, we have built a CVMFS user code publication system. This publication system enables users to still submit their code with their jobs but the code is distributed and accessed through the standard CVMFS infrastructure. The user code is automatically deleted from CVMFS after a period of no use. Most of the software for the system is available as a single self-contained open source rpm called cvmfs-user-pub and is available for other deployments.

Highlights

The CernVM File System [1] (CVMFS) is widely used in High Throughput Computing to distribute the software for many experiments in the scientific community
CernVM FileSystem (CVMFS) distributes software through a publish process that works well for small groups of people responsible for the shared code used by large experiments, but it is not designed for the much larger number of researchers (“users”) that write relatively small additions to the experiment code
Most user code, for example code to do physics analysis, is sent along with the jobs that users run on the grid

Summary

Introduction

The CernVM File System [1] (CVMFS) is widely used in High Throughput Computing (otherwise known as grid computing) to distribute the software for many experiments in the scientific community. CVMFS distributes software through a publish process that works well for small groups of people responsible for the shared code used by large experiments, but it is not designed for the much larger number of researchers (“users”) that write relatively small additions to the experiment code. Most user code, for example code to do physics analysis, is sent along with the jobs that users run on the grid. That process is relatively inefficient compared to CVMFS, especially when the user code is large. Many copies of the same large set of files can end up being sent across the world, even when the jobs are all running at the same site.

Motivation

Requirements

System Design

Control Flow

Publishing servers

Repository Cleanup

Minimizing Distribution Delays

Packaging

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Distributing User Code with the CernVM FileSystem

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

Towards a responsive CernVM-FS architecture
Radu Popescu ... Gerardo Ganis
EPJ Web of Conferences | VOL. 214
Radu Popescu, et. al.Radu Popescu ... Gerardo Ganis
01 Jan 2019
EPJ Web of Conferences | VOL. 214

Physics Data Production on HPC: Experience to be efficiently running at scale
M D Poat ... P Jackson
EPJ Web of Conferences | VOL. 245
M D Poat, et. al.M D Poat ... P Jackson
01 Jan 2020
EPJ Web of Conferences | VOL. 245

Deploying and extending CMS Tier 3s using VC3 and the OSG Hosted CE service
Kenyi Hurtado Anampa ... P Hristov
EPJ Web of Conferences | VOL. 214
Kenyi Hurtado Anampa, et. al.Kenyi Hurtado Anampa ... P Hristov
01 Jan 2019
EPJ Web of Conferences | VOL. 214

Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS.
Simone Mosciatti ... Clemens Lange
Frontiers in big data | VOL. 4
Simone Mosciatti, et. al.Simone Mosciatti ... Clemens Lange
11 May 2021
Frontiers in big data | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Distributing User Code with the CernVM FileSystem

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: EPJ Web of Conferences