Abstract

The distributed computing system of the ATLAS experiment at LHC is allowed to opportunistically use resources at the Czech national HPC center IT4Innovations in Ostrava. The jobs are submitted via an ARC Compute Element (ARC-CE) installed at the grid site in Prague. Scripts and input files are shared between the ARC-CE and a shared file system located at the HPC centre via sshfs. This basic submission system has worked there since the end of 2017. Several improvements were made to increase the amount of resource that ATLAS can use. The most significant change was the migration of the submission system to enable pre-emptable jobs, to adapt to the HPC management’s decision to start pre-empting opportunistic jobs. Another improvement of the submission system was related to the sshfs connection which seemed to be a limiting factor of the system. Now, the submission system consists of several ARC-CE machines. Also, various parameters of sshfs were tested in an attempt to increase throughput. As a result of the improvements, the utilisation of the Czech national HPC center by the ATLAS distributed computing increased.

Highlights

  • The distributed computing of the ATLAS experiment [1] at the Large Hadron Collider (LHC) opportunistically uses computing resources of the Salomon HPC cluster located at the Czech National HPC Center IT4Innovations (IT4I) in Ostrava

  • The process starts with the ARC Control Tower obtaining job description from the ATLAS workflow management system and submitting it to one of the ARC Compute Element (ARC-CE) machines installed at Czech Tier2 site [5]

  • The input files are stored in the ARC-CE cache located on Lustre storage, as one file can be reused by many jobs

Read more

Summary

Introduction

The distributed computing of the ATLAS experiment [1] at the Large Hadron Collider (LHC) opportunistically uses computing resources of the Salomon HPC cluster located at the Czech National HPC Center IT4Innovations (IT4I) in Ostrava. When Salomon was commissioned, it was ranked 39th in Top500 [2] (in June 2015). In the list published in June 2019, it was ranked 282nd [3]. Worker nodes of the HPC available to the ATLAS Distributed Computing (ADC) have the following hardware specifications:. The batch system is PBS Professional [4]

Settings
Pre-emption
Number of machines
Number of PBS requests
Performance
Summary and Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.