Abstract

The Worldwide LHC Computing Grid (WLCG) currently has about 170 sites. In order to support WLCG workloads, each site has to deploy and maintain a number of possibly complex grid services. Quite often, site managers require assistance of WLCG experts, for example when new software versions need to be deployed. Modern configuration management (e.g. Puppet, Ansible), container orchestration (e.g. Docker Swarm, Kubernetes) and containerization technologies (e.g. Docker, Podman) can help make such activities more lightweight by means of packaging sensible configurations of grid services and providing simple mechanisms to distribute and deploy them across the infrastructure available at a site. This article describes the SIMPLE project: a Solution for Installation, Management and Provisioning of Lightweight Elements. The SIMPLE framework leverages modern infrastructure management tools to deploy containerized grid services, such as popular compute elements (e.g. HTCondor, ARC), batch systems (e.g. HTCondor, Slurm), worker nodes, etc. Its architecture follows principles of sustainability, scalability and extensibility. We describe how system administrators can use the framework, as well as the first results, featuring the migration of computing resources to HTCondor at 2 sites. We conclude with an outlook on further developments.

Highlights

  • The Worldwide LHC Computing Grid (WLCG)[1] project is a collaboration of institutes across the world to provide a distributed computing infrastructure for storing and processing the data collected by the 4 main experiments at the Large Hadron Collider (LHC) at CERN: ALICE, ATLAS, CMS and LHCb

  • Required services can be prepackaged into Docker containers[2] along with configuration parameters preset to the extent possible, while site-specific values can be supplied through a configuration management system[3] and the containers get deployed through an orchestration system

  • Centro Brasileiro de Pesquisas Físicas (CBPF), a WLCG Tier-2 site located in Rio de Janeiro, has been an early adopter of the SIMPLE framework and a major contributor to the project

Read more

Summary

Introduction

The Worldwide LHC Computing Grid (WLCG)[1] project is a collaboration of institutes across the world to provide a distributed computing infrastructure for storing and processing the data collected by the 4 main experiments at the Large Hadron Collider (LHC) at CERN: ALICE, ATLAS, CMS and LHCb. In order to support WLCG workloads, each site has to deploy and maintain a number of possibly complex grid services, often requiring significant assistance from WLCG experts. The amount of effort spent on such activities may outweigh the amount of resources provided by a site, if the site is small. Through the use of containers with suitable orchestration and configuration management tools, the. Required services can be prepackaged into Docker containers[2] along with configuration parameters preset to the extent possible, while site-specific values can be supplied through a configuration management system[3] and the containers get deployed through an orchestration system.

The SIMPLE Framework
Component Repositories
Site Level Configuration File and Schema
Site Level Defaults
YAML Compiler
Validation Engines
Central Configuration Manager
Features
SIMPLE - Flow of Configuration Data
Deployments and Use Cases
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call