Modelling high-dimensional data by mixtures of factor analyzers

G.J Mclachlan,D Peel,R.W Bean

doi:10.1016/s0167-9473(02)00183-4

G.J Mclachlan, D Peel + Show 1 more

Open Access

https://doi.org/10.1016/s0167-9473(02)00183-4

Copy DOI

Abstract

We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments.

Full Text