On Clustering by Mixture Models

G J Mclachlan,D Peel,S K Ng

doi:10.1007/978-3-642-55721-7_16

Abstract

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster data sets; see, for example, (2000a). We consider the use of normal mixture models to cluster data sets of continuous multivariate data, concentrating on some of the associated computational issues. A robust version of this approach to clustering is obtained by modelling the data by a mixture of t distributions (Peel and McLachlan, 2000). The normal and t mixture models can be fitted by maximum likelihood via the EM algorithm, as implemented in the EMMIX software of the authors. We report some recent results of (2000) on speeding up the fitting process by an an incremental version of the EM algorithm. The problem of clustering high-dimensional data by use of the mixture of factor analyzers model (McLachlan and Peel, 2000b) is also considered. This approach enables a normal mixture model to be fitted to data which have high dimension relative to the number of data points to be clustered.KeywordsHuman Mammary Epithelial CellFactor Analyzer ModelFinite Mixture ModelNormal Mixture ModelProbabilistic Principal Component AnalyserThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text