Abstract

This paper analyzes the minimization of α-divergences in the context of multi-class Gaussian process classification. For this task, several methods are explored, including memory and computationally efficient variants of the Power Expectation Propagation algorithm, which allow for efficient training using stochastic gradients and mini-batches. When these methods are used for training, very large datasets (several millions of instances) can be considered. The proposed methods are also very general as they can interpolate between other popular approaches for approximate inference based on Expectation Propagation (EP) (α → 1) and Variational Bayes (VB) (α → 0) simply by varying the α parameter. An exhaustive empirical evaluation analyzes the generalization properties of each of the proposed methods for different values of the α parameter. The results obtained show that one can do better than EP and VB by considering intermediate values of α.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.