Abstract

ALICE (A Large Ion Collider Experiment) is the heavy-ion detector studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). The DAQ (Data Acquisition System) facilities handle the data flow from the detectors electronics up to the mass storage. The DAQ system is based on a large farm of commodity hardware consisting of more than 600 devices (Linux PCs, storage, network switches), and controls hundreds of distributed hardware and software components interacting together. This paper presents Orthos, the alarm system used to detect, log, report, and follow-up abnormal situations on the DAQ machines at the experimental area. The main objective of this package is to integrate alarm detection and notification mechanisms with a full-featured issues tracker, in order to prioritize, assign, and fix system failures optimally. This tool relies on a database repository with a logic engine, SQL interfaces to inject or query metrics, and dynamic web pages for user interaction. We describe the system architecture, the technologies used for the implementation, and the integration with existing monitoring tools.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.