Sparse autoregressive models for scalable generation of sparse images in particle physics

Yadong Lu,Pierre Baldi,Julian Collado,Daniel Whiteson

doi:10.1103/physrevd.103.036012

Abstract

Generation of simulated data is essential for data analysis in particle physics, but current Monte Carlo methods are very computationally expensive. Deep-learning-based generative models have successfully generated simulated data at lower cost, but struggle when the data are very sparse. We introduce a novel deep sparse autoregressive model (SARM) that explicitly learns the sparseness of the data with a tractable likelihood, making it more stable and interpretable when compared to Generative Adversarial Networks (GANs) and other methods. In two case studies, we compare SARM to a GAN model and a non-sparse autoregressive model. As a quantitative measure of performance, we compute the Wasserstein distance ($W_p$) between the distributions of physical quantities calculated on the generated images and on the training images. In the first study, featuring images of jets in which 90% of the pixels are zero-valued, SARM produces images with $W_p$ scores that are 24-52% better than the scores obtained with other state-of-the-art generative models. In the second study, on calorimeter images in the vicinity of muons where 98% of the pixels are zero-valued, SARM produces images with $W_p$ scores that are 66-68% better. Similar observations made with other metrics confirm the usefulness of SARM for sparse data in particle physics. Original data and software will be made available upon acceptance of the manuscript from the UCI Machine Learning in Physics web portal at: http://mlphysics.ics.uci.edu/.

Full Text