Abstract
Machine learning (ML) tools such as encoder-decoder convolutional neural networks (CNN) can represent incredibly complex nonlinear functions which map between combinations of images and scalars. For example, CNNs can be used to map combinations of accelerator parameters and images which are 2D projections of the 6D phase space distributions of charged particle beams as they are transported between various particle accelerator locations. Despite their strengths, applying ML to time-varying systems, or systems with shifting distributions, is an open problem, especially for large systems for which collecting new data for re-training is impractical or interrupts operations. Particle accelerators are one example of large time-varying systems for which collecting detailed training data requires lengthy dedicated beam measurements which may no longer be available during regular operations. We present a novel method of adaptive ML for time-varying systems. Our approach is to map very high (N ≈ 100k) dimensional inputs (a combination of scalar parameters and images) into the low dimensional (N ≈ 2) latent space at the output of the encoder section of an encoder-decoder CNN. We then actively tune the low dimensional latent space-based representation of complex system dynamics by the addition of an adaptively tuned feedback vector directly before the decoder sections builds back up to our image-based high-dimensional phase space density representations. This method allows us to learn correlations within and to quickly tune the characteristics of incredibly large parameter space systems and to track their evolution in real time based on feedback without massive new data sets for re-training. We demonstrate that our method can accurately predict and track the phase space of charged particle beams at various locations in a particle accelerator by adaptively adjusting in real-time while the unknown input beam distribution of the accelerator is changing in shape, charge, and offset and while the RF system of the accelerator itself is also changing in an unpredictable way. For FACET-II we demonstrate that such an approach has the potential to use transverse deflecting cavity and energy spread spectrum beam measurements to accurately predict 2D projections of the 6D phase space of the electron beam at the plasma wakefield acceleration interaction point where such diagnostics are unavailable.
Highlights
(CNN) can represent incredibly complex nonlinear functions which map between combinations of images and scalars
Machine learning (ML) methods have been utilized to develop surrogate models/virtual diagnostics [6,7,8], neural networks are being used to represent cost functions or optimal policies in reinforcement learning [9], powerful polynomial chaos expansion-based surrogate models have been used for uncertainty quantification [10], an interesting analysis of the latent space of neural network-based surrogate models has been used for uncertainty quantification [11], convolutional neural networks have been used for time-series classification and forecasting in accelerators [12], neural network-based surrogate models have been used to speed up optimization [13, 14], Bayessian Gaussian processes utilize learned correlations in data/physics-informed kernels [15,16,17,18,19,20,21,22], and various ML methods have been used for beam dynamics studies at CERN [23,24,25]
Efforts have begun to combine the robustness of adaptive feedback with the global representations that can be learned with ML methods to develop adaptive machine learning (AML) for time-varying systems
Summary
Machine learning (ML) tools are being developed that can learn representations of complex accelerator dynamics directly from data. ML methods have been utilized to develop surrogate models/virtual diagnostics [6,7,8], neural networks are being used to represent cost functions or optimal policies in reinforcement learning [9], powerful polynomial chaos expansion-based surrogate models have been used for uncertainty quantification [10], an interesting analysis of the latent space of neural network-based surrogate models has been used for uncertainty quantification [11], convolutional neural networks have been used for time-series classification and forecasting in accelerators [12], neural network-based surrogate models have been used to speed up optimization [13, 14], Bayessian Gaussian processes utilize learned correlations in data/physics-informed kernels [15,16,17,18,19,20,21,22], and various ML methods have been used for beam dynamics studies at CERN [23,24,25]. Distribution drift is a challenge for all ML methods including neural networks for surrogate models, the use of neural networks to represent cost functions or optimal policies in reinforcement learning, and even for methods such as Gaussian processes which utilize learned correlations in their kernels
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have