Abstract

The distributed computing of the ATLAS experiment at LHC has used computing resources of the Czech national HPC center IT4Innovations for several years. The submission system is based on ARC-CEs installed at the Czech Tier2 site (praguelcg2). Recent improvements of this system will be discussed here. First, there was a migration of the ARC-CE from version 5 to 6 which improves the reliability and scalability. A shared filesystem built on top of sshfs 3.7 no longer represents performance bottleneck. It provided an order of magnitude better transfer performance. New Singularity containers with full software stack can easily fit default resource limits on the IT4I cluster filesystem. A new submission system, allowing sequential running of payloads in one job, was set and adapted to HPC’s environment, improving usage on worker nodes with very high number of cores. Overall, the whole infrastructure provides significant contribution to resources provided by praguelcg2.

Highlights

  • The distributed computing of the ATLAS experiment at LHC [1] has used computing resources of the Czech national HPC center IT4Innovations for several years

  • The ARC Control Tower obtains a job description from the ATLAS workflow management system and submits it to one of the ARC-CE [2] machines installed at the Czech LHC Tier2 site [3]

  • Job auxiliary files are shared between ARC-CE and Lustre storage of an HPC node via sshfs connection

Read more

Summary

Introduction

The distributed computing of the ATLAS experiment at LHC [1] has used computing resources of the Czech national HPC center IT4Innovations for several years. In 2020, it was using three HPC systems of the IT4Innovations: Salomon (jobs are being sent there since December 2017), Barbora (used since January 2020), and Anselm (used since February 2020). This provides ATLAS with a significant amount of additional computing resources

Job submission system
Migration to ARC-CE version 6
Containerization
Long jobs
Parallel jobs
Sequential jobs
Performance
Summary and Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call