Efficient Bayesian inference with latent Hamiltonian neural networks in No-U-Turn Sampling

Somayajulu L.N. Dhulipala,Michael D. Shields,Yifeng Che

doi:10.1016/j.jcp.2023.112425

Abstract

When sampling for Bayesian inference, one popular approach in the computational field is to use Hamiltonian Monte Carlo (HMC) and specifically the No-U-Turn Sampler (NUTS), which automatically decides the end time of the Hamiltonian trajectory. However, HMC and NUTS can require numerous numerical gradients of the target density, and can prove slow in practice when relying on computationally expensive forward models. We propose Latent Hamiltonian neural networks (L-HNNs) with HMC and NUTS for solving Bayesian inference problems. Once trained, L-HNNs do not require numerical gradients of the target density during sampling, and hence numerous evaluations of the forward computational model. Moreover, L-HNNs satisfy important properties such as perfect time reversibility and Hamiltonian conservation, making them well-suited for use within HMC and NUTS because stationarity can be shown. We also propose the integration of L-HNNs in an online error monitoring scheme, in which numerical gradients of the target density are used for a few samples whenever the L-HNNs prediction errors are large. This online error monitor scheme prevents sample degeneracy in regions of low probability density and ensures robust uncertainty quantification. We demonstrate L-HNNs in NUTS with online error monitoring on several analytical examples involving complex, heavy-tailed, and high-local-curvature probability densities. We then demonstrate the applicability of L-HNNs in NUTS to two computational case studies, namely the Allen-Cahn stochastic partial differential equation and an elliptic partial differential equation with 25 and 50 inference parameters, respectively. Overall, the L-HNNs in NUTS with online error monitoring satisfactorily inferred these probability densities. Compared to traditional NUTS, L-HNNs in NUTS with online error monitoring required 1–2 orders of magnitude fewer numerical gradients of the target density and improved the effective sample size (ESS) per gradient (which is a measure of both the sampling quality and the computational expense) by an order of magnitude.

Full Text