Abstract
Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.
Highlights
This paper studies the statistical properties of variational inference as a tool to tackle two problems of interest: estimation and model selection in mixture models
The Bayesian paradigm has raised great interest among researchers and practitioners, especially through the Variational Bayes (VB) framework which aims at maximizing a quantity referred to as Evidence Lower Bound on the marginal likelihood (ELBO)
The main contribution of this paper is to prove that VB is consistent for estimation in mixture models, and that the ELBO maximization strategy used in practice is consistent for model selection
Summary
This paper studies the statistical properties of variational inference as a tool to tackle two problems of interest: estimation and model selection in mixture models. Alternative approaches were developed to study VB: [43] established Bernstein-von-Mises type theorems on the variational approximation of the posterior They provide very interesting results for parametric models but it is unclear whether these results can be extended to model selection or misspecified case. [48] succeeded in adapting the classical results of [22] to Variational Bayes and showed that a slight modification in the three classical "prior mass and testing conditions" leads to the convergence of their variational approximations, again under the assumption that the model is true With respect to these works, our contribution is a complete study of the consistency of VB for mixtures of general distributions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have