Alpha divergence minimization in multi-class Gaussian process classification

Carlos Villacampa-Calvo,Daniel Hernández-Lobato

doi:10.1016/j.neucom.2019.09.090

Carlos Villacampa-Calvo, Daniel Hernández-Lobato

Open Access

https://doi.org/10.1016/j.neucom.2019.09.090

Copy DOI

Journal: Neurocomputing	Publication Date: Oct 24, 2019
Citations: 6	License type: other-oa

Affiliation: Autonomous University of Madrid

Abstract

This paper analyzes the minimization of α-divergences in the context of multi-class Gaussian process classification. For this task, several methods are explored, including memory and computationally efficient variants of the Power Expectation Propagation algorithm, which allow for efficient training using stochastic gradients and mini-batches. When these methods are used for training, very large datasets (several millions of instances) can be considered. The proposed methods are also very general as they can interpolate between other popular approaches for approximate inference based on Expectation Propagation (EP) (α → 1) and Variational Bayes (VB) (α → 0) simply by varying the α parameter. An exhaustive empirical evaluation analyzes the generalization properties of each of the proposed methods for different values of the α parameter. The results obtained show that one can do better than EP and VB by considering intermediate values of α.

Full Text