Multivariate‐bounded Gaussian mixture model with minimum message length criterion for model selection

Nizar Bouguila,Muhammad Azam

doi:10.1111/exsy.12688

Abstract

AbstractBounded support Gaussian mixture model (BGMM) has been proposed for data modelling as an alternative to unbounded support mixture models for the cases when the data lies in bounded support. In this paper, we propose applications of multivariate BGMM in data clustering for more insightful analysis of the model. We also propose minimum message length (MML) criterion for model selection in data clustering using multivariate BGMM. The presented model is applied to data clustering in several speech (TSP and Spoken Digits) and image databases (MNIST and Fashion MNIST). We also propose the application of BGMM in code‐book generation at feature extraction phase. Inspired by the success of bag of visual words approach in computer vision, it is also introduced in speech data representation and validated through experiments presented in this paper. For validation of model selection criterion, MML is applied to different medical, speech and image datasets. Experimental results obtained during the model selection through MML are further compared with seven different model selection criteria. The results presented in the paper demonstrate the effectiveness of BGMM for clustering speech and image databases, code‐book generation through clustering for feature representation and model selection.

Full Text