Abstract
We introduce an efficient framework for computing the distance between collider events using the tools of Linearized Optimal Transport (LOT). This preserves many of the advantages of the recently-introduced Energy Mover's Distance, which quantifies the "work" required to rearrange one event into another, while significantly reducing the computational cost. It also furnishes a Euclidean embedding amenable to simple machine learning algorithms and visualization techniques, which we demonstrate in a variety of jet tagging examples. The LOT approximation lowers the threshold for diverse applications of the theory of optimal transport to collider physics.
Highlights
What is the distance between collider events? This question, simple to pose, is notoriously difficult to answer
To the extent that the 2-Wasserstein distance has a pseudo-Riemannian structure, the Linearized Optimal Transport (LOT) approximation amounts to projecting onto the 2-Wasserstein tangent plane at a chosen reference event and computing simpler l2 distances on that plane. We make this point of view rigorous in the Appendix, where we prove that, as the reference event in the LOT approximation is refined, LOT converges to the distance between events on the tangent plane, which provides a well-defined metric on the space of events
IV, where we explore the performance of linear discriminate analysis (LDA), k-nearest neighbor, support vector machine (SVM), and k-medoids clustering algorithms in the pairwise classification of boosted QCD, W, t, Higgs, and beyond-Standard Model (BSM) jets
Summary
What is the distance between collider events? This question, simple to pose, is notoriously difficult to answer. One of the major practical challenges to the use of EMD in analyzing collider events is the computational cost; for a dataset containing Nevt events, computing the pairwise distance between all events is OðN2evtÞ.. We define an efficient framework for computing the distance between collider events by applying the tools of Linearized Optimal Transport (LOT), preserving the many advantages of the EMD while significantly reducing the computational cost and furnishing a Euclidean embedding suitable for use in a wide range of ML algorithms. A proof of the convergence of the LOT approximation to a true metric in the continuum limit is reserved for the Appendix
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have