Abstract

The current distributed computing resources used for simulating and processing collision data collected by ATLAS and the other LHC experiments are largely based on dedicated x86 Linux clusters. Access to resources, job control and software provisioning mechanisms are quite different from the common concept of self-contained HPC applications run by particular users on specific HPC systems. We report on the development and the usage in ATLAS of a SSH backend to the Advanced Resource Connector (ARC) middleware to enable HPC compliant access and on the corresponding software provisioning mechanisms.

Highlights

  • The Worldwide LHC Computing Grid (WLCG) [1] has been set up to meet the needs of the ATLAS [2] and other CERN Large Hadron Collider (LHC) experiments’ software stacks

  • We developed an extension to the Advanced Resource Connector (ARC) middleware to include an interface to submit and manage ATLAS Production ANd Distributed Analysis framework (PanDA) workloads as jobs to a resource manager of a remote High Performance Computing (HPC) machine

  • The ARC-Computing Element (CE) interface to SLURM was modified to generate a special job script, which takes into account the hybrid SLURM/ALPS architecture of Cray HPC systems and runs the job through ALPS aprun when it is executed by SLURM

Read more

Summary

Introduction

The Worldwide LHC Computing Grid (WLCG) [1] has been set up to meet the needs of the ATLAS [2] and other CERN Large Hadron Collider (LHC) experiments’ software stacks. High Performance Computing (HPC) centres worldwide provide general purpose high-grade (non distributed) systems and are used for a wide range of computationally intensive tasks in various fields, including climate research, weather forecasting, molecular modelling, and quantum mechanics The use of such systems is typically regulated by strict rules, with single users granted access in order to run self-contained applications that are developed and built for the system’s architecture. Workload Management The Production ANd Distributed Analysis framework (PanDA) [11] is the ATLAS approach to a datadriven workload manager It has been designed and developed by ATLAS in order to meet the challenging requirements on throughput, scalability, robustness, minimal operations manpower, and efficiently integrated data/processing management. The middleware requirements to interface the workload management are a challenge, since these services cannot generally be implemented on demand in a HPC centre, and on computational nodes

Access to Resources
File system access
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call