Abstract

We apply techniques from Bayesian generative statistical modeling to uncover hidden features in jet substructure observables that discriminate between different a priori unknown underlying short distance physical processes in multi-jet events. In particular, we use a mixed membership model known as Latent Dirichlet Allocation to build a data-driven unsupervised top-quark tagger and $t\bar t$ event classifier. We compare our proposal to existing traditional and machine learning approaches to top jet tagging. Finally, employing a toy vector-scalar boson model as a benchmark, we demonstrate the potential for discovering New Physics signatures in multi-jet events in a model independent and unsupervised way.

Highlights

  • The use of jet substructure techniques in studying large area jets has played an important role in identifying hadronic decays of Higgs and electroweak gauge bosons in runs 1 and 2 of the LHC [1,2,3,4]

  • We have demonstrated a new unsupervised machine learning (ML) technique for disentangling signal and background events in mixed samples by identifying features in jet substructure observables that differentiate between the two

  • To do so we have mapped jet substructure distributions onto a LDA model, a generative probabilistic model widely used in Bayesian statistics approaches to unsupervised ML

Read more

Summary

INTRODUCTION

The use of jet substructure techniques in studying large area jets has played an important role in identifying hadronic decays of Higgs and electroweak gauge bosons in runs 1 and 2 of the LHC [1,2,3,4]. In the last few years, machine learning (ML) tools have extended the application of jet substructure in tagging jets at the LHC [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] through the use of neural networks (NNs) to process and “learn” from vast amounts of training data Since these approaches rely on theoretical predictions for pure signal and background training data sets [typically through Monte Carlo (MC) generators], they (a) are exposed to MC mismodeling of realistic events as reconstructed from real data and detectors; (b) require exact model knowledge of both expected signal and backgrounds. We compare them to existing conventional and ML approaches and outline possible further improvements and future directions

GENERATIVE BAYESIAN MODELS OF JET SUBSTRUCTURE
UNSUPERVISED TOP TAGGER
UNSUPERVISED NP SEARCH
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call