Abstract

For the last 10 years, the ATLAS Distributed Computing project has based its monitoring infrastructure on a set of custom designed dashboards provided by CERN. This system functioned very well for LHC Runs 1 and 2, but its maintenance has progressively become more difficult and the conditions for Run 3, starting in 2021, will be even more demanding; hence a more standard code base and more automatic operations are needed. A new infrastructure has been provided by CERN, based on InfluxDB as the data store and Grafana as the display environment. ATLAS has adapted and further developed its monitoring tools to use this infrastructure for data and workflow management monitoring and accounting dashboards, expanding the range of previous possibilities with the aim to achieve a single, simpler, environment for all monitoring applications. This document describes these tools and the data flows for monitoring and accounting.

Highlights

  • The ATLAS [1] Distributed Computing (ADC) uses two core-systems to run jobs on the grid and manage the data - the workflow management system PanDA [2] and the distributed data management system Rucio [3]

  • During the LHC Run 1 and Run 2 the monitoring and accounting systems were based on custom frameworks developed by CERN IT and ADC and had been in use for 10 years

  • The data collection, processing and display presented in this document are based on Unified Monitoring Infrastructure (UMA)

Read more

Summary

Introduction

The ATLAS [1] Distributed Computing (ADC) uses two core-systems to run jobs on the grid and manage the data - the workflow management system PanDA [2] and the distributed data management system Rucio [3]. During the LHC Run 1 and Run 2 the monitoring and accounting systems were based on custom frameworks developed by CERN IT and ADC and had been in use for 10 years. These systems served well during that time but they started to show their age in several areas; in particular, the. The original developers had long left and with them a lot of the in-depth knowledge necessary to further optimise the system and the ability to quickly add new features to the monitoring. In 2016 the CERN MonIT group started to build a new Unified Monitoring Infrastructure (UMA) based on open source technology [4]. The data collection, processing and display presented in this document are based on UMA

The CERN MonIT Infrastructure
Processing
Backends
Dashboards
Transfer Dashboard
Site Accounting
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call