Abstract

Increased operational effectiveness and the dynamic integration of only temporarily available compute resources (opportunistic resources) becomes more and more important in the next decade, due to the scarcity of resources for future high energy physics experiments as well as the desired integration of cloud and high performance computing resources. This results in a more heterogenous compute environment, which gives rise to huge challenges for the computing operation teams of the experiments. At the Karlsruhe Institute of Technology (KIT) we design solutions to tackle these challenges. In order to ensure an efficient utilization of opportunistic resources and unified access to the entire infrastructure, we developed the Transparent Adaptive Resource Dynamic Integration System (TARDIS). A scalable multi-agent resource manager providing interfaces to provision as well as dynamically and transparently integrate resources of various providers into one common overlay batch system. Operational effectiveness is guaranteed by relying on COBalD – the Opportunistic Balancing Daemon and its simple approach of taking into account the utilization and allocation of the different resource types, in order to run the individual workflows on the best-suited resource respectively. In this contribution we will present the current status of integrating various HPC centers and cloud providers into the compute infrastructure at the Karlsruhe Institute of Technology as well as our experiences gained in a production environment.

Highlights

  • Nowadays computing in high energy physics (HEP) is predominantly relying on homogenous resources provided by the World LHC Computing Grid (WLCG) [1] based on a flat-budget funding model

  • In contrast to the homogenous resources provided by the WLCG, utilising opportunistic resources results in a more heterogenous computing environment not fully-controlled by WLCG policies and imposing huge challenges to the computing operation teams of the experiments

  • We have presented a general multi-experiment solution on how to integrate opportunistic resources into the WLCG computing by associating them to existing WLCG sites close by and utilising well established Grid computing elements as single point of entry for the experiments

Read more

Summary

Introduction

Nowadays computing in high energy physics (HEP) is predominantly relying on homogenous resources provided by the World LHC Computing Grid (WLCG) [1] based on a flat-budget funding model. Recent studies of the HEP Software Foundation [2], legitimately assuming a continuity of the flat-budget funding model, show that the expected technology advance will not be sufficient to meet the computing requirements of future HEP experiments. One promising approach to narrow the gap is to supplement the WLCG with resources not permanently dedicated, but temporarily available for HEP computing tasks. -called opportunistic resources are mainly provided by High Performance Computing (HPC) Centres as well as commercial and public cloud providers.

Opportunistic Resources and their Challenges
The TARDIS Resource Manager
November 2019
Dedicated Share at High Performance Computing Centres
Back-filling at High Performance Computing Centres
Back-filling of Tier-3 Resources
Integration of Cloud Resources
Findings
Conclusion and Outlook

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.