Abstract

The ALICE Experiment was designed to study the physics of strongly interacting matter with heavy-ion collisions at the CERN LHC. A major upgrade of the detector and computing model (O2, Offline-Online) is currently ongoing. The ALICE O2 farm will consist of almost 1000 nodes enabled to read out and process on-the-fly about 27 Tb/s of raw data. To efficiently operate the experiment and the O2 facility a new monitoring system was developed. It will provide a complete overview of the overall health, detect performance degradation and component failures by collecting, processing, storing and visualising data from hardware and software sensors and probes. The core of the system is based on Apache Kafka ensuring high throughput, fault-tolerant and metric aggregation, processing with the help of Kafka Streams. In addition, Telegraf provides operating system sensors, InfluxDB is used as a time-series database, Grafana as a visualisation tool. The above tool selection evolved from the initial version where collectD was used instead of Telegraf, and Apache Flume together with Apache Spark instead of Apache Kafka.

Highlights

  • ALICE (A Large Ion Collider Experiment) [1] is a detector designed to study the physics of strongly interacting matter, produced in heavy-ion collisions at the CERN Large Hadron Collider (LHC)

  • After the successful Run 1 (2010-2013) and Run 2 (2015-2018) data taking periods, the LHC entered into a consolidation phase (Long Shutdown 2) and ALICE started its upgrade to fully exploit the increase in luminosity expected in Run 3

  • The upgrade foresees a complete replacement of the computing systems (Data Acquisition, High-Level Trigger and Offline) by a single, common O2 (Online-Offline) system

Read more

Summary

The ALICE Experiment

ALICE (A Large Ion Collider Experiment) [1] is a detector designed to study the physics of strongly interacting matter (the Quark–Gluon Plasma), produced in heavy-ion collisions at the CERN Large Hadron Collider (LHC). ALICE consists of a central barrel and a forward muon spectrometer, allowing for a comprehensive study of hadrons, electrons, muons and photons produced in the collisions of heavy ions. The ALICE collaboration has an ambitious physics program for proton–proton and proton–ion collisions. After the successful Run 1 (2010-2013) and Run 2 (2015-2018) data taking periods, the LHC entered into a consolidation phase (Long Shutdown 2) and ALICE started its upgrade to fully exploit the increase in luminosity expected in Run 3. The upgrade foresees a complete replacement of the computing systems (Data Acquisition, High-Level Trigger and Offline) by a single, common O2 (Online-Offline) system

ALICE O2
Monitoring subsystem
Telegraf
Metric processor
InfluxDB consumer
Alarms and notifications
Performance
InfluxDB
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.