Abstract
To satisfy future computing demands of the Worldwide LHC Computing Grid (WLCG), opportunistic usage of third-party resources is a promising approach. While the means to make such resources compatible with WLCG requirements are largely satisfied by virtual machines and containers technologies, strategies to acquire and disband many resources from many providers are still a focus of current research. Existing meta-schedulers that manage resources in the WLCG are hitting the limits of their design when tasked to manage heterogeneous resources from many diverse resource providers.To provide opportunistic resources to the WLCG as part of a regular WLCG site, we propose a new meta-scheduling approach suitable for opportunistic, heterogeneous resource provisioning. Instead of anticipating future resource requirements, our approach observes resource usage and promotes well-used resources. Following this approach, we have developed an inherently robust meta-scheduler, COBalD, for managing diverse, heterogeneous resources given unpredictable resource requirements. This paper explains the key concepts of our approach, and discusses the benefits and limitations of our new approach to dynamic resource provisioning compared to previous approaches.
Highlights
Dynamic resource provisioning in the Worldwide LHC Computing Grid (WLCG) [1] is commonly based on meta-scheduling and the pilot model [2]: A meta-scheduler pre-computes the ideal set of resources for a given set of workflows; so-called pilot jobs acquire and integrate these resources into an overlay batch system, which processes the initial workflows
The GridKa Tier 1 centre has developed a new approach for dynamic provisioning that is suitable for the WLCG and beyond
Even though job to resource to job meta-scheduling performs well for homogeneous resources and jobs, we have not been able to apply it to more complex, dynamic cases
Summary
Dynamic resource provisioning in the WLCG [1] is commonly based on meta-scheduling and the pilot model [2]: A meta-scheduler pre-computes the ideal set of resources for a given set of workflows; so-called pilot jobs acquire and integrate these resources into an overlay batch system, which processes the initial workflows. While this approach offers a high level of control and precision, we have found the strong coupling between components to inherently limit scalability, flexibility and robustness. We have successfully used our work for provisiong HPC and Cloud resources to the WLCG, as well as managing abstract resources in the form of multi-core and single-core allocations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have