Predictive modeling typically relies on Bayesian model calibration to provide uncertainty quantification. Variational inference utilizing fully independent (“mean-field”) Gaussian distributions are often used as approximate probability density functions. This simplification is attractive since the number of variational parameters grows only linearly with the number of unknown model parameters. However, the resulting diagonal covariance structure and unimodal behavior can be too restrictive to provide useful approximations of intractable Bayesian posteriors that exhibit highly non-Gaussian behavior, including multimodality. High-fidelity surrogate posteriors for these problems can be obtained by considering the family of Gaussian mixtures. Gaussian mixtures are capable of capturing multiple modes and approximating any distribution to an arbitrary degree of accuracy, while maintaining some analytical tractability. Unfortunately, variational inference using Gaussian mixtures with full-covariance structures suffers from a quadratic growth in variational parameters with the number of model parameters. The existence of multiple local minima due to strong nonconvex trends in the loss functions often associated with variational inference present additional complications, These challenges motivate the need for robust initialization procedures to improve the performance and computational scalability of variational inference with mixture models.In this work, we propose a method for constructing an initial Gaussian mixture model approximation that can be used to warm-start the iterative solvers for variational inference. The procedure begins with a global optimization stage in model parameter space. In this step, local gradient-based optimization, globalized through multistart, is used to determine a set of local maxima, which we take to approximate the mixture component centers. Around each mode, a local Gaussian approximation is constructed via the Laplace approximation. Finally, the mixture weights are determined through constrained least squares regression. The robustness and scalability of the proposed methodology is demonstrated through application to an ensemble of synthetic tests using high-dimensional, multimodal probability density functions. The practical aspects of the approach are demonstrated with inversion problems in structural dynamics.
Read full abstract