Abstract

The ATLAS experiment at the Large Hadron Collider at CERN relies on a complex and highly distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data obtained at unprecedented energy and rates. The Run Control (RC) system is the component steering the data acquisition by starting and stopping processes and by carrying all data-taking elements through well-defined states in a coherent way. Taking into account all the lessons learnt during LHC's Run 1, the RC has been completely re-designed and re-implemented during the LHC Long Shutdown 1 (LS1) phase. As a result of the new design, the RC is assisted by the Central Hint and Information Processor (CHIP) service that can be truly considered its “brain”. CHIP is an intelligent system able to supervise the ATLAS data taking, take operational decisions and handle abnormal conditions. In this paper, the design, implementation and performances of the RC/CHIP system will be described. A particular emphasis will be put on the way the RC and CHIP cooperate and on the huge benefits brought by the Complex Event Processing engine. Additionally, some error recovery scenarios will be analysed for which the intervention of human experts is now rendered unnecessary.

Highlights

  • The Trigger and Data Acquisition (TDAQ) system [1] of the ATLAS detector [2] at the Large Hadron Collider (LHC) at CERN is composed of a large number of distributed hardware and software components which provide, in a coordinated manner, the data-taking functionality of the overall system.The Run Control (RC) and the Central Hint and Information Processor (CHIP) are key components of the Online Software framework that encompasses the software to configure, control and monitor the TDAQ system

  • During LHC Run 1, the detection and handling of problems was based on an embedded rule-based forward-chaining expert system (CLIPS [3]), which was deeply integrated with the RC system

  • Even though the system performed well, it had major disadvantages: new rules could not be tested without reproducing the error conditions in the production environment and monitoring of system resources used by specific rules was not possible

Read more

Summary

Introduction

The Trigger and Data Acquisition (TDAQ) system [1] of the ATLAS detector [2] at the Large Hadron Collider (LHC) at CERN is composed of a large number of distributed hardware and software components (about 2000 machines and more than 15000 concurrent processes at the end of LHC’s Run I) which provide, in a coordinated manner, the data-taking functionality of the overall system. Even though the system performed well, it had major disadvantages: new rules could not be tested without reproducing the error conditions in the production environment and monitoring of system resources used by specific rules was not possible. This made the development and debugging of new rules difficult. CHIP is an intelligent application having a global view on the TDAQ system. It supervises the ATLAS data taking, takes operational decisions and handles abnormal conditions.

The Run Control system
The CHIP
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call