Abstract
High Energy Physics (HEP) experiments entail an abundance of computing resources, i.e. sites, to run simulations and analyses by processing data. This requirement is fulfilled by local batch farms, grid sites, private/commercial clouds, and supercomputing centers via High Throughput Computing (HTC). The growing needs of such experiments and resources being prone to trends of heterogeneity make it difficult for physicists to handle these resources directly. Additionally, HEP collaborations heavily rely on data and software releases, typically in the order of tens of gigabytes, while conducting simulations and analyses. Hence, aspects of scalability, reliability, and maintenance become crucial with regards to the distribution of the necessary data and software stack. The GlideinWMS [4] framework helps with the resource management problem by using pilot jobs, aka Glideins, to provision reliable elastic virtual clusters. Glideins are submitted to unreliable heterogeneous resources which are validated and customized by the Glideins to make the worker nodes available for end-user job execution. On the other hand, the CernVM File System (CernVM-FS or CVMFS) [1] helps with data distribution. It is a write-once, read-everywhere filesystem used to deploy scientific software to thousands of nodes on a worldwide distributed computing infrastructure. CVMFS is based on the Hyper Text Transfer Protocol and has been widely used within the particle physics community for (1) distributing experiment software and data such as calibrations, and (2) facilitating containerization by efficiently hosting container images along with providing containerization software, especially Singularity [3] GlideinWMS relies on CVMFS installed locally on the computing resources to satisfy the experiments' software needs. This requires system administrators' effort to install and maintain CVMFS at the sites and limits the use of sites, especially HPC resources, that do not have CVMFS installed. This poster presents a solution, taking advantage of Glideins to provide CVMFS at most sites without the need for a local installation. Doing so expands the pool of resources available for HEP experiments and reduces the effort of system administrators for current resources. Additionally, the proposed solution allows GlideinWMS to also start Singularity [3], a containerization software that can run unprivileged, on sites where neither CVMFS nor Singularity are available, including HPC sites. The benefits provided by this solution are: (1) lower overhead for site administrators in that they have less software to install, (2) an expanded pool of resources that run user jobs with easy access to software and data provided by CVMFS, thus making life easier for the scientists, and (3) improved flexibility to use HPC resources by enabling GlideinWMS pilot jobs to support HPC sites.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have