Policy Gradient Importance Sampling for Bayesian Inference

Yousef El-Laham,Monica F Bugallo

doi:10.1109/tsp.2021.3093792

Yousef El-Laham, Monica F Bugallo

Open Access

https://doi.org/10.1109/tsp.2021.3093792

Copy DOI

Journal: IEEE Transactions on Signal Processing	Publication Date: Jan 1, 2021
Citations: 3	License type: publisher-specific-oa

Affiliation: Stony Brook University

Abstract

In this paper, we propose a novel adaptive importance sampling (AIS) algorithm for probabilistic inference. The sampler learns a proposal distribution adaptation strategy by framing AIS as a reinforcement learning problem. Under this structure, the proposal distribution of the sampler is treated as an agent whose state is controlled using a parameterized policy. At each iteration of the algorithm, the agent earns a reward that is related to its contribution to the variance of the AIS estimator of the normalization constant of the target distribution. Policy gradient methods are employed to learn a locally optimal policy that maximizes the expected value of the sum of all rewards. Numerical simulations on two different examples demonstrate promising results for the future application of the proposed method to complex Bayesian models.

Full Text