In any Bayesian computations, the first step is to derive the joint distribution of all the unknown variables given the observed data. Then, we have to do the computations. There are four general methods for performing computations: Joint MAP optimization; Posterior expectation computations that require integration methods; Sampling-based methods, such as MCMC, slice sampling, nested sampling, etc., for generating samples and numerically computing expectations; and finally, Variational Bayesian Approximation (VBA). In this last method, which is the focus of this paper, the objective is to search for an approximation for the joint posterior with a simpler one that allows for analytical computations. The main tool in VBA is to use the Kullback-Leibler Divergence (KLD) as a criterion to obtain that approximation. Even if, theoretically, this can be conducted formally, for practical reasons, we consider the case where the joint distribution is in the exponential family, and so is its approximation. In this case, the KLD becomes a function of the usual parameters or the natural parameters of the exponential family, where the problem becomes parametric optimization. Thus, we compare four optimization algorithms: general alternate functional optimization; parametric gradient-based with the normal and natural parameters; and the natural gradient algorithm. We then study their relative performances on three examples to demonstrate the implementation of each algorithm and their efficiency performance.
Read full abstract