Integrated automation for configuration management and operations in the ATLAS online computing farm

Artem Amirkhanov,Haydn Du Plessis,Arturo Sanchez Pineda,Matthew Shaun Twomey,Diana Alessandra Scannicchio,Konstantinos Mitrogeorgos,Franco Brasolin,Sergio Ballestrero,Christopher Jon Lee,Marco Pernigotti

doi:10.1051/epjconf/201921408022

Abstract

The online farm of the ATLAS experiment at the LHC, consisting of nearly 4000 PCs with various characteristics, provides configuration and control of the detector and performs the collection, processing, selection, and conveyance of event data from the front-end electronics to mass storage. Different aspects of the farm management are already accessible via several tools. The status and health of each node are monitored by a system based on Icinga 2 and Ganglia. PuppetDB gathers centrally all the status information from Puppet, the configuration management tool used to ensure configuration consistency of every node. The in-house Configuration Database (ConfDB) controls DHCP and PXE, while also integrating external information sources. In these proceedings we present our roadmap for integrating these and other data sources and systems, and building a higher level of abstraction on top of this foundation. An automation and orchestration tool will be able to use these systems and replace lengthy manual procedures, some of which also require interactions with other systems and teams, e.g. for the repair of a faulty node. Finally, an inventory and tracking system will complement the available data sources, keep track of node history, and improve the evaluation of long-term lifecycle management and purchase strategies.

Highlights

The online farm of the ATLAS [1] experiment at the LHC consists of nearly 4000 nodes with various characteristics
Configuration Database (ConfDB) manages the status of the node and function (TDAQ, Sim@P1 [8], etc.)
OKS [6] [12] is a library to support a simple, active persistent in-memory object manager. It is used as the frame of the configuration database to provide the overall description of the Data Acquisition (DAQ) system, the trigger and detectors software and hardware

Summary

Introduction

The online farm of the ATLAS [1] experiment at the LHC consists of nearly 4000 nodes with various characteristics. Due to the large scale of the farm and the variety of the systems, appropriate tools to address various requirements are needed to effectively manage [2] and monitor these nodes [3]. This is a time consuming process, and the expert must remember to update all the tools in the correct order (as per the defined procedures). A procedure may require the expert to constantly monitor the status of the node to determine when it is ready for an intervention and this results in an ineffective workflow

Tools overview

Configuration Database

Monitoring

OKS - Object Kernel Support

Implementation

Schedule Downtime

Results

Inventory and tracking system

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Integrated automation for configuration management and operations in the ATLAS online computing farm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2019
License type: CC BY 4.0

Similar Papers

A practical approach to semantic configuration management
M Moriconi
ACM SIGSOFT Software Engineering Notes | VOL. 14
M MoriconiM Moriconi
01 Nov 1989
ACM SIGSOFT Software Engineering Notes | VOL. 14

A practical approach to semantic configuration management
M Moriconi
-
M MoriconiM Moriconi
01 Jan 1989
01 Jan 1989

Configuration and options management processes and tools: an automotive OEM case study
Keith T Phelan ... Stephan Knackstedt
Journal of Manufacturing Technology Management | VOL. 28
Keith T Phelan, et. al.Keith T Phelan ... Stephan Knackstedt
06 Mar 2017
Journal of Manufacturing Technology Management | VOL. 28

An evaluation of configuration management for high performance computing on clouds

-

28 Feb 2017
28 Feb 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrated automation for configuration management and operations in the ATLAS online computing farm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences