Using the Autopilot pattern to deploy container resources at a WLCG Tier-2

Gareth Roy,Samuel Cadellin Skipsey,Gordon Stewart,Emanuele Simili,David Britton

doi:10.1051/epjconf/201921407013

Abstract

Containers are becoming ubiquitous within the WLCG, with CMS announcing a requirement for its sites to provide Singularity during 2018. The ubiquity of containers means it is now possible to reify the combination of an application and its configuration into a single easy-to-deploy unit, avoiding the need to make use of a myriad of configuration management tools such as Puppet, Ansible or Salt. This allows use to be made of industry-standard devops techniques within the operations domain, such as Continuous Integration (CI) and Continuous Deployment (CD), which can lead to faster upgrades and greater system security. One interesting technique is the Autopilot pattern, which provides mechanisms for application life-cycle management which are accessible from within the container itself. Using modern service discovery techniques, each container manages its own configuration, monitors its own health, and adapts to changing requirements through the use of event triggers. In this paper, we expand on previous work to create and deploy resources to a WLCG Tier-2 via containers, and investigate the viability of using the Autopilot pattern at a WLCG site to deploy and manage computational resources.

Highlights

Throughout industry, containers have rapidly become the method of choice to encapsulate complex software projects, offering the benefits of simplified deployment and repeatable application builds
Large LHC experiments such as CMS and ATLAS have embraced containers for WLCG payloads; for example, CMS currently runs much of its production work in Singularity [18] containers, obtaining the images it requires via CVMFS [6]. Containers make it possible to reify configuration along with the applications being executed as a single easy-to-deploy unit, ensuring that all necessary dependencies are satisfied; as the number of containers which are deployed across a site increases, it becomes difficult to monitor, track and integrate these containers in an overarching system
Each container becomes responsible for configuring itself at startup, and tearing down and tidying up on job completion; containers are responsible for performing any necessary health checks, scaling their resource usage according to the active workload, and recovering from any failures which may occur

Summary

Introduction

Throughout industry, containers have rapidly become the method of choice to encapsulate complex software projects, offering the benefits of simplified deployment and repeatable application builds. Large LHC experiments such as CMS and ATLAS have embraced containers for WLCG payloads; for example, CMS currently runs much of its production work in Singularity [18] containers, obtaining the images it requires via CVMFS [6] Containers make it possible to reify configuration along with the applications being executed as a single easy-to-deploy unit, ensuring that all necessary dependencies are satisfied; as the number of containers which are deployed across a site increases, it becomes difficult to monitor, track and integrate these containers in an overarching system (a problem which is exacerbated if the containers are short-lived). We apply the Autopilot pattern to containers running LHC experiment application payloads; these containers were developed in our previous work [1]

Container Components

Container Life-cycle

Consul

Health and Network Latency

System Overview

Conclusions