Abstract

Anomaly detection techniques are growing in importance at the Large Hadron Collider (LHC), motivated by the increasing need to search for new physics in a model-agnostic way. In this work, we provide a detailed comparative study between a well-studied unsupervised method called the autoencoder (AE) and a weakly-supervised approach based on the Classification Without Labels (CWoLa) technique. We examine the ability of the two methods to identify a new physics signal at different cross sections in a fully hadronic resonance search. By construction, the AE classification performance is independent of the amount of injected signal. In contrast, the CWoLa performance improves with increasing signal abundance. When integrating these approaches with a complete background estimate, we find that the two methods have complementary sensitivity. In particular, CWoLa is effective at finding diverse and moderately rare signals while the AE can provide sensitivity to very rare signals, but only with certain topologies. We therefore demonstrate that both techniques are complementary and can be used together for anomaly detection at the LHC.

Highlights

  • For each signal-to-bac√kground ratio (S/B) benchmark, the performance of Classification Without Labels (CWoLa) Hunting is evaluated across ten independent runs to reduce the statistical error using a random subset of signal events each time

  • After exploring a large range of cross sections, we decided to examine this range in S/B because it is sufficient to observe an intersection in the performance of the two methods

  • The key difference between these two methods is that the weak labels of CWoLa Hunting allow it to utilize the specific features of the signal overdensity, making it ideal in the limit of large signal rate, while the unsupervised AE does not rely on any information about the signal and is robust to small signal rates

Read more

Summary

Simulation

In order to investigate the performance of CWoLa Hunting and AEs in a generic hadronic resonance search, we consider a benchmark new physics signal pp → Z → X Y , with X → j j j and Y → j j j. The mass of the new heavy particle is set to m Z = 3.5 TeV, and we consider two scenarios for the masses of the new lighter particles: m X , mY = 500 GeV and m X , mY = 300 GeV These signals typically produce a pair of large-radius jets J with invariant mass mJJ 3.5 TeV, with masses of m J = 500, 300 GeV and a three-prong substructure. These signals are generated in the LHC Olympics framework [60].

Machine learning setup
Autoencoder
Signal benchmarks
Supervised metrics
Sideband fit and p-values
What did the machine learn?
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call