Abstract
The Worldwide LHC Computing Grid (WLCG) currently has about 170 sites. In order to support WLCG workloads, each site has to deploy and maintain a number of possibly complex grid services. Quite often, site managers require assistance of WLCG experts, for example when new software versions need to be deployed. Modern configuration management (e.g. Puppet, Ansible), container orchestration (e.g. Docker Swarm, Kubernetes) and containerization technologies (e.g. Docker, Podman) can help make such activities more lightweight by means of packaging sensible configurations of grid services and providing simple mechanisms to distribute and deploy them across the infrastructure available at a site. This article describes the SIMPLE project: a Solution for Installation, Management and Provisioning of Lightweight Elements. The SIMPLE framework leverages modern infrastructure management tools to deploy containerized grid services, such as popular compute elements (e.g. HTCondor, ARC), batch systems (e.g. HTCondor, Slurm), worker nodes, etc. Its architecture follows principles of sustainability, scalability and extensibility. We describe how system administrators can use the framework, as well as the first results, featuring the migration of computing resources to HTCondor at 2 sites. We conclude with an outlook on further developments.
Highlights
The Worldwide LHC Computing Grid (WLCG)[1] project is a collaboration of institutes across the world to provide a distributed computing infrastructure for storing and processing the data collected by the 4 main experiments at the Large Hadron Collider (LHC) at CERN: ALICE, ATLAS, CMS and LHCb
Required services can be prepackaged into Docker containers[2] along with configuration parameters preset to the extent possible, while site-specific values can be supplied through a configuration management system[3] and the containers get deployed through an orchestration system
Centro Brasileiro de Pesquisas Físicas (CBPF), a WLCG Tier-2 site located in Rio de Janeiro, has been an early adopter of the SIMPLE framework and a major contributor to the project
Summary
The Worldwide LHC Computing Grid (WLCG)[1] project is a collaboration of institutes across the world to provide a distributed computing infrastructure for storing and processing the data collected by the 4 main experiments at the Large Hadron Collider (LHC) at CERN: ALICE, ATLAS, CMS and LHCb. In order to support WLCG workloads, each site has to deploy and maintain a number of possibly complex grid services, often requiring significant assistance from WLCG experts. The amount of effort spent on such activities may outweigh the amount of resources provided by a site, if the site is small. Through the use of containers with suitable orchestration and configuration management tools, the. Required services can be prepackaged into Docker containers[2] along with configuration parameters preset to the extent possible, while site-specific values can be supplied through a configuration management system[3] and the containers get deployed through an orchestration system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have