When entering the phase of big data processing and statistical inferences in experimental physics, the efficient use of machine learning methods may require optimal data preprocessing methods and, in particular, optimal balance between details and noise. In experimental studies of strong-field quantum electrodynamics with intense lasers, this balance concerns data binning for the observed distributions of particles and photons. Here we analyze the aspect of binning with respect to different machine learning methods (Support Vector Machine (SVM), Gradient Boosting Trees (GBT), Fully-Connected Neural Network (FCNN), Convolutional Neural Network (CNN)) using numerical simulations that mimic expected properties of upcoming experiments. We see that binning can crucially affect the performance of SVM and GBT, and, to a less extent, FCNN and CNN. This can be interpreted as the latter methods being able to effectively learn the optimal binning, discarding unnecessary information. Nevertheless, given limited training sets, the results indicate that the efficiency can be increased by optimizing the binning scale along with other hyperparameters. We present specific measurements of accuracy that can be useful for planning of experiments in the specified research area.
Read full abstract