Abstract

With the ever-growing amount of data collected with the experiments at the Large Hadron Collider (LHC), the need for computing resources that can handle the analysis of this data is also rapidly increasing. This increase will even be amplified after upgrading to the High Luminosity LHC [1]. High-Performance Computing (HPC) and other cluster computing resources provided by universities can be useful supplements to the resources dedicated to the experiment as part of the Worldwide LHC Computing Grid (WLCG) for data analysis and production of simulated event samples. Freiburg is operating a combined Tier2/Tier3, the ATLAS-BFG [2]. The shared HPC cluster "NEMO" at the University of Freiburg has been made available to local ATLAS [3] users through the provisioning of virtual machines incorporating the ATLAS software environment analogously to the bare metal system of the local ATLAS Tier2/Tier3 centre. In addition to the provisioning of the virtual environment, the on-demand integration of these resources into the Tier3 scheduler in a dynamic way is described. In order to provide the external NEMO resources to the user in a transparent way, an intermediate layer connecting the two batch systems is put into place. This resource scheduler monitors requirements on the user-facing system and requests resources on the backend-system.

Highlights

  • The analysis of collision data collected at the Large Hadron Collider (LHC) and simulation of events is primarily done at 2 Tier0, 13 Tier1 and 160 Tier2 sites within the Worldwide LHC Computing Grid (WLCG) [4]

  • High Performance Computing (HPC) clusters, as provided by universities and other institutions, sometimes even co-located at the same sites, may be used for High Throughput Computing (HTC)-like workflows to extend the capacities of the existing WLCG resources

  • An increase of the CPU performance by the order of 5% is observed when going from the Tier2/Tier3 bare metal to the NEMO virtual machines (VMs), while going from the NEMO VMs to the NEMO bare metal leads to a further increase of performance of the order of 5% as well

Read more

Summary

Introduction

The analysis of collision data collected at the LHC and simulation of events is primarily done at 2 Tier0 Tier and 160 Tier sites within the WLCG [4]. These benchmarks are used to quantify the modification of the performance due to changes in the configuration of the resource scheduler and of the virtual machines being spawned They will be part of a future continuous monitoring effort in order to be able to detect changes in the submitted workloads. A fully virtualized environment, independent of the choices made on the HPC cluster itself, will give the best possible scope to implement a system, that looks and behaves in the same way as the non-virtualized Tier2/Tier cluster This consistency between the two systems would make it possible in the future to redirect ATLAS grid jobs submitted remotely to either NEMO or any other opportunistic resource as long as the resource provides the needed infrastructure to run the VM images. This information will be used for continuous monitoring of the robustness and performance of the system

Generation of the virtual machines
Connection of front and backend batch systems
Findings
Summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call