Abstract

The CMS experiment at CERN LHC has a dedicated infrastructure to handle the alignment and calibration data. This infrastructure is composed of several services, which take on various data management tasks required for the consumption of the non-event data (also called as condition data) in the experiment activities. The criticality of these tasks imposes tights requirements for the availability and the reliability of the services executing them. In this scope, a comprehensive monitoring and alarm generating system has been developed. The system has been implemented based on the Nagios open source industry standard for monitoring and alerting services, and monitors the database back-end, the hosting nodes and key heart-beat functionalities for all the services involved. This paper describes the design, implementation and operational experience with the monitoring system developed and deployed at CMS in 2016.

Highlights

  • The Compact Muon Solenoid (CMS) experiment[1] at the CERN Large Hadron Collider in Geneva, Switzerland[2], entails the production of two classes of sizable datasets both essential to meet the goals of its physics program :

  • Distributing condition data for consumption in the data processing and analyzing workflows. Such services are crucial in the execution of the main data processing work-flows of the CMS experiment, such as: the event selection at the High Level Trigger (HLT), the reconstruction of the recorded collisions, and the production of simulated events

  • The Alignment Calibration and Database team in CMS (AlCa/DB) developed a comprehensive monitoring and alarm generating system as described in this paper. Such monitoring system was developed based on the Nagios open source industry standard for monitoring and alerting services, and assesses the status of the database back-end, of the hosting nodes and of key heart-beat functionalities for all the services involved

Read more

Summary

Introduction

The Compact Muon Solenoid (CMS) experiment[1] at the CERN Large Hadron Collider in Geneva, Switzerland[2], entails the production of two classes of sizable datasets both essential to meet the goals of its physics program : it acquires 1 kHz o events ( 1 MB each) for a few months a year; each raw event is processed to enable data analysis by the world-wide collaboration; it generates simulated events comparable in number and size to the datasets collected with the experimental apparatus.To ensure the availability and optimal performance in the exploitation of such datasets, CMS has a complex infrastructure to handle the non event data which describe the alignment and calibration of all its sensitive elements. The Alignment Calibration and Database team in CMS (AlCa/DB) developed a comprehensive monitoring and alarm generating system as described in this paper. Such monitoring system was developed based on the Nagios open source industry standard for monitoring and alerting services, and assesses the status of the database back-end, of the hosting nodes and of key heart-beat functionalities for all the services involved.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call