We review five types of physics-informed machine-learning (PIML) algorithms for inversion and modeling of geophysical data. Such algorithms use the combination of a data-driven machine-learning (ML) method and the equations of physics to model or invert geophysical data (or both). By incorporating the constraints of physics, PIML algorithms can effectively reduce the size of the solution space for ML models, enabling them to be trained on smaller data sets. This is especially advantageous in scenarios in which data availability may be limited or expensive to obtain. In this review, we restrict the physics to be that from the governing wave equation, either as a constraint that must be satisfied or by using numerical solutions of the wave equation for modeling and inversion. This approach ensures that the resulting models adhere to physical principles while leveraging the power of ML to analyze and interpret complex geophysical data. There are several potential benefits of PIML compared to standard numerical modeling or inversion of seismic data computed by, for example, finite-difference solutions to the wave equation. 1) Empirical tests suggest that PIML algorithms constrained by the physics of wave propagation can sometimes resist getting stuck in a local minima compared with standard full-waveform inversion (FWI). 2) After the weights of the neural network are found by training, then the forward and inverse operations by PIML can be more than several orders of magnitude more efficient than FWI. However, the computational cost for general training can be enormous. 3) If the ML inversion operator [Formula: see text] is locally trained on a small portion of the recorded data [Formula: see text], then there is sometimes no need for millions of training examples that aim for global generalization of [Formula: see text]. The benefit is that the locally trained [Formula: see text] can be used to economically invert the remaining test data [Formula: see text] for the true velocity [Formula: see text], where [Formula: see text] can comprise more than 90% of the recorded data.
Read full abstract