Multi-scale cross-attention transformer encoder for event classification

A Hammad,S Moretti,M Nojiri

doi:10.1007/jhep03(2024)144

Abstract

We deploy an advanced Machine Learning (ML) environment, leveraging a multi-scale cross-attention encoder for event classification, towards the identification of the gg → H → hh → bb¯bb¯\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$ b\\overline{b}b\\overline{b} $$\\end{document} process at the High Luminosity Large Hadron Collider (HL-LHC), where h is the discovered Standard Model (SM)-like Higgs boson and H a heavier version of it (with mH> 2mh). In the ensuing boosted Higgs regime, the final state consists of two fat jets. Our multi-modal network can extract information from the jet substructure and the kinematics of the final state particles through self-attention transformer layers. The diverse learned information is subsequently integrated to improve classification performance using an additional transformer encoder with cross-attention heads. We showcase that our approach surpasses current alternative methods used to establish sensitivity to this process in performance, whether solely based on kinematic analysis or combining this with mainstream ML approaches. Then, we employ various interpretive methods to evaluate the network results, including attention map analysis and visual representation of Gradient-weighted Class Activation Mapping (Grad-CAM). Finally, we note that the proposed network is generic and can be applied to analyse any process carrying information at different scales. Our code is publicly available for generic use (https://github.com/AHamamd150/Multi-Scale-Transformer-Encoder).

Full Text