Lightweight dynamic integration of opportunistic resources

Max Fischer,Eileen Kuehn,Manuel Giffels,Andreas Heiss,Andreas Petzold,Matthias Jochen Schnepf,C Doglioni,P Jackson,W Kamleh,L Silvestris,G.A Stewart,D Kim

doi:10.1051/epjconf/202024507040

Max Fischer, Eileen Kuehn + Show 10 more

Open Access

https://doi.org/10.1051/epjconf/202024507040

Copy DOI

Journal: EPJ web of conferences	Publication Date: Jan 1, 2020
Citations: 6	License type: CC BY 4.0

Affiliation: Karlsruhe Institute of Technology

Abstract

To satisfy future computing demands of the Worldwide LHC Computing Grid (WLCG), opportunistic usage of third-party resources is a promising approach. While the means to make such resources compatible with WLCG requirements are largely satisfied by virtual machines and containers technologies, strategies to acquire and disband many resources from many providers are still a focus of current research. Existing meta-schedulers that manage resources in the WLCG are hitting the limits of their design when tasked to manage heterogeneous resources from many diverse resource providers.To provide opportunistic resources to the WLCG as part of a regular WLCG site, we propose a new meta-scheduling approach suitable for opportunistic, heterogeneous resource provisioning. Instead of anticipating future resource requirements, our approach observes resource usage and promotes well-used resources. Following this approach, we have developed an inherently robust meta-scheduler, COBalD, for managing diverse, heterogeneous resources given unpredictable resource requirements. This paper explains the key concepts of our approach, and discusses the benefits and limitations of our new approach to dynamic resource provisioning compared to previous approaches.

Highlights

Dynamic resource provisioning in the Worldwide LHC Computing Grid (WLCG) [1] is commonly based on meta-scheduling and the pilot model [2]: A meta-scheduler pre-computes the ideal set of resources for a given set of workflows; so-called pilot jobs acquire and integrate these resources into an overlay batch system, which processes the initial workflows
The GridKa Tier 1 centre has developed a new approach for dynamic provisioning that is suitable for the WLCG and beyond
Even though job to resource to job meta-scheduling performs well for homogeneous resources and jobs, we have not been able to apply it to more complex, dynamic cases

Summary

Introduction

Dynamic resource provisioning in the WLCG [1] is commonly based on meta-scheduling and the pilot model [2]: A meta-scheduler pre-computes the ideal set of resources for a given set of workflows; so-called pilot jobs acquire and integrate these resources into an overlay batch system, which processes the initial workflows. While this approach offers a high level of control and precision, we have found the strong coupling between components to inherently limit scalability, flexibility and robustness. We have successfully used our work for provisiong HPC and Cloud resources to the WLCG, as well as managing abstract resources in the form of multi-core and single-core allocations

Job to Resource to Job Meta-Scheduling

Feedback Control Loop Meta-Scheduling

The COBalD Pool Model

Orthogonality of Job and Meta-Scheduler

Towards Implicit Network Scheduling

Findings

Conclusions