Abstract

The LHCb High Level Trigger (HLT) is split in two stages. HLT1 is synchronous with collisions delivered by the LHC and writes its output to a local disk buffer, which is asynchronously processed by HLT2. Efficient monitoring of the data being processed by the application is crucial to promptly diagnose detector or software problems. HLT2 consists of approximately 50000 processes and 4000 histograms are produced by each process. This results in 200 million histograms that need to be aggregated for each of up to a hundred data taking intervals that are being processed simultaneously. This paper presents the multi-level hierarchical architecture of the monitoring infrastructure put in place to achieve this. Network bandwidth is minimised by sending histogram increments and only exchanging metadata when necessary, using a custom lightweight protocol based on boost::serialize. The transport layer is implemented with ZeroMQ, which supports IPC and TCP communication, queue handling, asynchronous request/response and multipart messages. The persistent storage to ROOT is parallelized in order to cope with data arriving from a hundred of data taking intervals being processed simultaneously by HLT2. The performance and the scalability of the current system are presented. We demonstrate the feasibility of such an approach for the HLT1 use case, where real-time feedback and reliability of the infrastructure are crucial. In addition, a prototype of a high-level transport layer based on the stream-processing platform Apache Kafka is shown, which has several advantages over the lower-level ZeroMQ solution.

Highlights

  • Starting from Run 3 in Q2 2021, the LHCb experiment will run on a trigger-less configuration where the detector will be read by the Event Filter Farm (EFF) at 30MHz

  • Some new features were introduced in respect to the HLT1 system: the messages were split into increment and metadata and the EFF was grouped in sub-farms and top level adders

  • The HLT1 and HLT2 monitoring systems running in Run 2 were presented as well as a prototype using a commercial software to handle the distribution of the monitoring information

Read more

Summary

Introduction

Starting from Run 3 in Q2 2021, the LHCb experiment will run on a trigger-less configuration where the detector will be read by the Event Filter Farm (EFF) at 30MHz. Run 3 HLT will have the same two-level configuration as in Run 2. HLT1 will be synchronous with collisions delivered by LHC writing its output to a local disk buffer which will be asynchronously processed by HLT2. With this trigger-less configuration, monitoring very efficiently the data processed by the application will be crucial to promptly diagnose detector or software problems. For this purpose, scalability and performance of the Run 2 HLT monitoring infrastructure.

The LHCb HLT monitoring infrastructure
Prototyping a monitoring system using Kafka
Infrastructure and configuration
Requirements
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.