CMS multicore scheduling strategy

Antonio Pérez-Calero Yzquierdo,Krista Majewski,Jose Hernández,Alison Mccrea,Burt Holzman

doi:10.1088/1742-6596/513/3/032074

Antonio Pérez-Calero Yzquierdo, Krista Majewski + Show 3 more

Open Access

https://doi.org/10.1088/1742-6596/513/3/032074

Copy DOI

Abstract

In the next years, processor architectures based on much larger numbers of cores will be most likely the model to continue "Moore's Law" style throughput gains. This not only results in many more jobs in parallel running the LHC Run 1 era monolithic applications, but also the memory requirements of these processes push the workernode architectures to the limit. One solution is parallelizing the application itself, through forking and memory sharing or through threaded frameworks. CMS is following all of these approaches and has a comprehensive strategy to schedule multicore jobs on the GRID based on the glideinWMS submission infrastructure. The main component of the scheduling strategy, a pilot-based model with dynamic partitioning of resources that allows the transition to multicore or whole-node scheduling without disallowing the use of single-core jobs, is described. This contribution also presents the experiences made with the proposed multicore scheduling schema and gives an outlook of further developments working towards the restart of the LHC in 2015.

Highlights

Processor architectures based on much larger numbers of cores will be most likely the model to continue ”Moore’s Law” style throughput gains
CMS is following all of these approaches and has a comprehensive strategy to schedule multicore jobs on the GRID based on the glideinWMS submission infrastructure
The main component of the scheduling strategy, a pilot-based model with dynamic partitioning of resources that allows the transition to multicore or whole-node scheduling without disallowing the use of single-core jobs, is described

Summary

Introduction

Processor architectures based on much larger numbers of cores will be most likely the model to continue Moores Law style throughput gains. Adapting LHC experiments computing to use multicore CPUs requires software modifications at the level of the application itself and in the scheduling tools, both grid-wide and at the site level. Multicore pilots are required to run multicore applications, but even its use to schedule single core jobs is advantageous, as the number of pilots required to manage the whole experiment workload can be greatly reduced.

Results

Conclusion