Generalization capabilities of translationally equivariant neural networks

Srinath Bulusu,Daniel Schuh,Matteo Favoni,David I Müller,Andreas Ipp

doi:10.1103/physrevd.104.074504

Srinath Bulusu, Daniel Schuh + Show 3 more

Open Access

https://doi.org/10.1103/physrevd.104.074504

Copy DOI

Journal: Physical review. D, Particles and fields	Publication Date: Oct 6, 2021
Citations: 11	License type: CC BY 4.0

Affiliation: TU Wien

Abstract

The rising adoption of machine learning in high energy physics and lattice field theory necessitates the re-evaluation of common methods that are widely used in computer vision, which, when applied to problems in physics, can lead to significant drawbacks in terms of performance and generalizability. One particular example for this is the use of neural network architectures that do not reflect the underlying symmetries of the given physical problem. In this work, we focus on complex scalar field theory on a two-dimensional lattice and investigate the benefits of using group equivariant convolutional neural network architectures based on the translation group. For a meaningful comparison, we conduct a systematic search for equivariant and non-equivariant neural network architectures and apply them to various regression and classification tasks. We demonstrate that in most of these tasks our best equivariant architectures can perform and generalize significantly better than their non-equivariant counterparts, which applies not only to physical parameters beyond those represented in the training set, but also to different lattice sizes.

Highlights

Machine learning has become an increasingly popular tool for a diverse range of applications in physics
One big disadvantage of Flattening architecture (FL) architectures impedes them from predicting on other lattice sizes than the one it was trained on: It requires a fixed input size
Even though the strided architecture (ST) architecture keeps its worse performance from the 60 × 4 lattice, the generalization ability to the different lattice sizes is comparable for the equivariant architecture (EQ) and the ST architecture, with the exception of the 100 × 5 lattice for the latter

Summary

Introduction

Machine learning has become an increasingly popular tool for a diverse range of applications in physics. Modern CNN architectures are based on the idea that a network’s prediction should not change when the input is shifted. They rely on two key ingredients that have already been introduced by the neocognitron [1] over 40 years ago: convolutional layers (S cells) and pooling (subsampling, downsampling) layers (C cells). This incorporation of a translational symmetry was an essential advantage over its predecessor, the cognitron [2]. Equivariance under translations is not guaranteed in a generic CNN, even though it is the idea behind weight sharing in the convolutional layers

Objectives

Results

Conclusion