Abstract

The HPC environment presents several challenges to the ATLAS experiment in running their automated computational workflows smoothly and efficiently, in particular regarding issues such as software distribution and I/O load. A vital component of the LHC Computing Grid, CVMFS, is not always available in HPC environments. ATLAS computing has experimented with all-inclusive containers, and later developed an environment to produce such containers for both Shifter and Singularity. The all-inclusive containers include most of the recent ATLAS software releases, database releases, and other tools extracted from CVMFS. This helped ATLAS to distribute software automatically to HPC centres with an environment identical to those in CVMFS. It also significantly reduced the metadata I/O load to HPC shared file systems. The production operation at NERSC has proved that by using this type of containers, we can transparently fit into the previously developed ATLAS operation methods, and at the same time scale up to run many more jobs.

Highlights

  • The Grid computing model developed by the World Wide LHC Computing Grid (WLCG) provided most of the computing resource for the LHC Run 1 and Run 2

  • Special attention is needed on the per file hard link limit imposed by file system during the building process - we found that a few files can have an many as 900 k hard links each

  • To speed up operation on large number of small files during CVMFS data extraction and deduplication, as well as squashfs and rsync, we use the same technology we proposed to use on the HPCs – create a large EXT3 file system in a GPFS file and loop mount it

Read more

Summary

Motivation

The Grid computing model developed by the World Wide LHC Computing Grid (WLCG) provided most of the computing resource for the LHC Run 1 and Run 2. It is clear that the ATLAS experiment [3] needs to explore non-Grid opportunistic resources in large quantity in order to satisfy the need of LHC Run 3 and Run 4. Supercomputers such as Cori [4] and Edison [5] at NERSC in the United States, Theta [6] at ALCF, Titan [7] at OLCF, Piz Daint [8] at CSCS in Switzerland and MaroNostrum [9] at BSC in Spain are usually much larger systems. This paper will discuss two of those challenges: making ATLAS software available on HPCs and reducing metadata I/O on HPC shared file systems

Making ATLAS software available on HPCs
Building all-inclusive containers for ATLAS
Extracting CVMFS contents and deduplication
Filtering
Packing software into a container
Container building environment
Use cases
Next steps
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call