Abstract

Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.

Highlights

  • This paper studies the statistical properties of variational inference as a tool to tackle two problems of interest: estimation and model selection in mixture models

  • The Bayesian paradigm has raised great interest among researchers and practitioners, especially through the Variational Bayes (VB) framework which aims at maximizing a quantity referred to as Evidence Lower Bound on the marginal likelihood (ELBO)

  • The main contribution of this paper is to prove that VB is consistent for estimation in mixture models, and that the ELBO maximization strategy used in practice is consistent for model selection

Read more

Summary

Introduction

This paper studies the statistical properties of variational inference as a tool to tackle two problems of interest: estimation and model selection in mixture models. Alternative approaches were developed to study VB: [43] established Bernstein-von-Mises type theorems on the variational approximation of the posterior They provide very interesting results for parametric models but it is unclear whether these results can be extended to model selection or misspecified case. [48] succeeded in adapting the classical results of [22] to Variational Bayes and showed that a slight modification in the three classical "prior mass and testing conditions" leads to the convergence of their variational approximations, again under the assumption that the model is true With respect to these works, our contribution is a complete study of the consistency of VB for mixtures of general distributions.

Background and notations
A PAC-Bayesian inequality
Application to multinomial mixture models
Application to Gaussian mixture models
Extension to the misspecified case
Variational Bayes model selection
Conclusion
An upper bound on the Kullback-Leibler divergence between two mixtures
Normal-Inverse-Gamma prior
Factorized prior
6.10.1 Algorithm 1
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call