Statistical inference for normal mixtures with unknown number of components

Mian Huang,Weixin Yao,Shiyi Tang

doi:10.1214/22-ejs2061

Abstract

Statistical inference for normal mixture models with unknown number of components has long been challenging due to the issues of nonidentifiability, degenerated Fisher matrix, and boundary parameters. In this paper, a penalized likelihood estimation procedure is proposed for mixtures of normals with unknown number of components to achieve both the order selection consistency and the root-n convergence rate for the component parameters estimators. We show that the proposed new estimator could avoid being trapped in certain degenerated regions of the nonidentifiable subset of the parameter space for over-fitted normal mixture models so that a regular asymptotic quadratic Taylor expansion of the mixture log-likelihood could be derived. With a suitable penalty function on mixing proportions, the new estimator is proved to be consistent on the order selection, and have an asymptotic normal distribution. Our derived sparsity conditions also reveal some surprising but interesting differences among some commonly used penalty functions and explain why the performance of some popularly used penalty functions, such as Lasso and SCAD, provide unsatisfactory results in the order selection. Extensive simulations and a real data analysis are conducted to demonstrate the effectiveness of the newly proposed estimator.

Full Text