Abstract

The certification of the CMS experiment data as usable for physics analysis is a crucial task to ensure the quality of all physics results published by the collaboration. Currently, the certification conducted by human experts is labor intensive and based on the scrutiny of distributions integrated on several hours of data taking. This contribution focuses on the design and prototype of an automated certification system assessing data quality on a per-luminosity section (i.e. 23 seconds of data taking) basis. Anomalies caused by detector malfunctioning or sub-optimal reconstruction are difficult to enumerate a priori and occur rarely, making it difficult to use classical supervised classification methods such as feedforward neural networks. We base our prototype on a semi-supervised approach which employs deep autoencoders. This approach has been qualified successfully on CMS data collected during the 2016 LHC run: we demonstrate its ability to detect anomalies with high accuracy and low false positive rate, when compared against the outcome of the manual certification by experts. A key advantage of this approach over other machine learning technologies is the great interpretability of the results, which can be further used to ascribe the origin of the problems in the data to a specific sub-detector or physics objects.

Highlights

  • Data certification (DC) process is the final step in the CMS Data Quality Monitoring (DQM) procedure

  • Current decisions by human experts are labor intensive and the histograms are integrated based on acquisition run basis

  • Machine learning (ML) methods open up the possibility to provide additional quality indicator in the current CMS DC procedure as the decision function can be learned directly from the copious archives of the past monitoring data and corresponding labels provided by experts

Read more

Summary

Introduction

Data certification (DC) process is the final step in the CMS Data Quality Monitoring (DQM) procedure. Current decisions by human experts are labor intensive and the histograms are integrated based on acquisition run basis. An acquisition run corresponds to a given setup both of the CMS detector and of LHC accelerator. One run could be a relatively long data taking interval.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call