Abstract

The CMS Submission Infrastructure Global Pool, built on Glidein-WMS andHTCondor, is a worldwide distributed dynamic pool responsible for the allocation of resources for all CMS computing workloads. Matching the continuously increasing demand for computing resources by CMS requires the anticipated assessment of its scalability limitations. In addition, the Global Plmust be able to expand in a more heterogeneous environment, in terms of resource provisioning (combining Grid, HPC and Cloud) and workload submissi.A dedicated testbed has been set up to simulate such conditions with the purpose of finding potential bottlenecks in the software or its configuration. This report provides a thorough description of the various scalabilitydimensions in size and complexity that are being explored for the future Global Pool, along with the analysis and solutions to the limitations proposed with the support of the GlideinWMS and HTCondor developer teams.

Highlights

  • The CMS Global Pool is a dynamically sized and centrally managed HTCondor [1] pool, built by the submission of GlideinWMS [2] pilot jobs to the Computing Elements of the multiple resource providers supporting CMS distributed across the Worldwide LHC Computing Grid [3] (WLCG), and extended to additional resources [4] as well, such as opportunistic, Cloud, and allocations on HPC

  • The Global Pool has demonstrated flexibility to integrate non-pledged resources. All of this is achieved while ensuring a high level of efficiency in the utilization of the allocated CPUs [8]. Considering how successful such an infrastructure and its operation has been for CMS during the LHC Run 2, and looking onwards to the coming years, the CMS Submission Infrastructure (SI) team has been performing tests in order to explore the potential limitations of our current model when increasing in size and complexity, which will be described

  • The CMS Submission Infrastructure team is analyzing the very successful Global Pool model in order to preventively detect potential bottlenecks deriving from increasing scales or additional complexity

Read more

Summary

Introduction

The CMS Global Pool is a dynamically sized and centrally managed HTCondor [1] pool, built by the submission of GlideinWMS [2] pilot jobs (glideins) to the Computing Elements of the multiple resource providers supporting CMS distributed across the Worldwide LHC Computing Grid [3] (WLCG), and extended to additional resources [4] as well, such as opportunistic (e.g. grid sites not pledging CPU to CMS), Cloud, and allocations on HPC. It allows CMS policies such as workload prioritization and fair-shares (e.g. production workflows vs analysis tasks) to be centrally managed. The Global Pool has demonstrated flexibility to integrate non-pledged resources (e.g. opportunistic usage of the CMS High Level Trigger farm [7]) All of this is achieved while ensuring a high level of efficiency in the utilization of the allocated CPUs [8]. Considering how successful such an infrastructure and its operation has been for CMS during the LHC Run 2, and looking onwards to the coming years, the CMS Submission Infrastructure (SI) team has been performing tests in order to explore the potential limitations of our current model when increasing in size and complexity, which will be described

Motivation for scale testing
Scale testing of the Submission Infrastructure
Scaling tests results
Schedds
Conclusions and Outlook
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call