Abstract
The CernVM FileSystem (CVMFS) is widely used in High Throughput Computing to efficiently distributed experiment code. However, the standard CVMFS publishing tools are designed for a small group of people from each experiment to maintain common software, and the tools are not a good fit for publishing software from numerous users in each experiment. As a result, most user code, such as code to do specific physics analyses, is still sent with every job to the place the job is run. That process is relatively inefficient, especially when the user code is large. To overcome these limitations, we have built a CVMFS user code publication system. This publication system enables users to still submit their code with their jobs but the code is distributed and accessed through the standard CVMFS infrastructure. The user code is automatically deleted from CVMFS after a period of no use. Most of the software for the system is available as a single self-contained open source rpm called cvmfs-user-pub and is available for other deployments.
Highlights
The CernVM File System [1] (CVMFS) is widely used in High Throughput Computing to distribute the software for many experiments in the scientific community
CernVM FileSystem (CVMFS) distributes software through a publish process that works well for small groups of people responsible for the shared code used by large experiments, but it is not designed for the much larger number of researchers (“users”) that write relatively small additions to the experiment code
Most user code, for example code to do physics analysis, is sent along with the jobs that users run on the grid
Summary
The CernVM File System [1] (CVMFS) is widely used in High Throughput Computing (otherwise known as grid computing) to distribute the software for many experiments in the scientific community. CVMFS distributes software through a publish process that works well for small groups of people responsible for the shared code used by large experiments, but it is not designed for the much larger number of researchers (“users”) that write relatively small additions to the experiment code. Most user code, for example code to do physics analysis, is sent along with the jobs that users run on the grid. That process is relatively inefficient compared to CVMFS, especially when the user code is large. Many copies of the same large set of files can end up being sent across the world, even when the jobs are all running at the same site.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have