Linearized optimal transport for collider events

Katy Craig,Tianji Cai,Junyi Cheng,Nathaniel Craig

doi:10.1103/physrevd.102.116019

Katy Craig, Tianji Cai + Show 2 more

Open Access

https://doi.org/10.1103/physrevd.102.116019

Copy DOI

Journal: Physical Review D	Publication Date: Dec 29, 2020
Citations: 64	License type: CC BY 4.0

Affiliation: University of California, Santa Barbara

Abstract

We introduce an efficient framework for computing the distance between collider events using the tools of Linearized Optimal Transport (LOT). This preserves many of the advantages of the recently-introduced Energy Mover's Distance, which quantifies the "work" required to rearrange one event into another, while significantly reducing the computational cost. It also furnishes a Euclidean embedding amenable to simple machine learning algorithms and visualization techniques, which we demonstrate in a variety of jet tagging examples. The LOT approximation lowers the threshold for diverse applications of the theory of optimal transport to collider physics.

Highlights

What is the distance between collider events? This question, simple to pose, is notoriously difficult to answer
To the extent that the 2-Wasserstein distance has a pseudo-Riemannian structure, the Linearized Optimal Transport (LOT) approximation amounts to projecting onto the 2-Wasserstein tangent plane at a chosen reference event and computing simpler l2 distances on that plane. We make this point of view rigorous in the Appendix, where we prove that, as the reference event in the LOT approximation is refined, LOT converges to the distance between events on the tangent plane, which provides a well-defined metric on the space of events
IV, where we explore the performance of linear discriminate analysis (LDA), k-nearest neighbor, support vector machine (SVM), and k-medoids clustering algorithms in the pairwise classification of boosted QCD, W, t, Higgs, and beyond-Standard Model (BSM) jets

Summary

INTRODUCTION

What is the distance between collider events? This question, simple to pose, is notoriously difficult to answer. One of the major practical challenges to the use of EMD in analyzing collider events is the computational cost; for a dataset containing Nevt events, computing the pairwise distance between all events is OðN2evtÞ.. We define an efficient framework for computing the distance between collider events by applying the tools of Linearized Optimal Transport (LOT), preserving the many advantages of the EMD while significantly reducing the computational cost and furnishing a Euclidean embedding suitable for use in a wide range of ML algorithms. A proof of the convergence of the LOT approximation to a true metric in the continuum limit is reserved for the Appendix

LINEARIZED OPTIMAL TRANSPORT

OBJECT CLASSIFICATION WITH LOT

MACHINE LEARNING WITH LOT

CONCLUSION

E Þ Z ðx2

Findings

E Þ: ðA6Þ