Abstract

Factor-analytic Gaussian mixture models are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be specified in advance of model fitting, and remain fixed. The pair which optimises some model selection criterion is then chosen. For computational reasons, models in which the number of latent factors differ across clusters are rarely considered. Here the infinite mixture of infinite factor analysers (IMIFA) model is introduced. IMIFA employs a Pitman-Yor process prior to facilitate automatic inference of the number of clusters using the stick-breaking construction and a slice sampler. Furthermore, IMIFA employs multiplicative gamma process shrinkage priors to allow cluster-specific numbers of factors, automatically inferred via an adaptive Gibbs sampler. IMIFA is presented as the flagship of a family of factor-analytic mixture models, providing flexible approaches to clustering high-dimensional data. Applications to a benchmark data set, metabolomic spectral data, and a manifold learning handwritten digit example illustrate the IMIFA model and its advantageous features. These include obviating the need for model selection criteria, reducing the computational burden associated with the search of the model space, improving clustering performance by allowing cluster-specific numbers of factors, and quantifying uncertainty in the numbers of clusters and cluster-specific factors.

Highlights

  • In cases where the number of variables p is comparable to or greater than the number of observations N, many clustering techniques tend to perform poorly or be intractable

  • A software implementation for infinite mixture of infinite factor analysers (IMIFA) and its family of sub-models is provided by the associated R package IMIFA (Murphy et al, 2019b), which is freely available from www.r-project.org (R Core Team, 2019), with which all results were generated

  • For the OMIFA model, the adaptive Gibbs sampler (AGS) is modified to handle empty components: the multiplicative gamma process (MGP)-related parameters are simulated from the relevant priors and each corresponding Λg matrix is restricted to having q factors, i.e. the same number of columns currently in the matrix of factor scores η, either by truncation or by padding with zeros, as required

Read more

Summary

Introduction

In cases where the number of variables p is comparable to or greater than the number of observations N , many clustering techniques tend to perform poorly or be intractable. By allowing infinitely many factors within each cluster, IMIFA addresses the difficulty in choosing the optimal number of factors This facilitates fitting factor-analytic models which are more flexible, in the sense that the number of factors may be clusterspecific, thereby potentially improving clustering performance. This is achieved by assuming multiplicative gamma process (MGP) shrinkage priors (Bhattacharya and Dunson, 2011; Durante, 2017) on the cluster-specific factor loading matrices, generalising the MGP prior to the mixture setting. The IMIFA model with its PYP-MGP prior offers a single-pass and computationally efficient approach to clustering high-dimensional data It can be viewed as the most flexible model at the head of a family of Bayesian factor-analytic mixture models. A software implementation for IMIFA and its family of sub-models is provided by the associated R package IMIFA (Murphy et al, 2019b), which is freely available from www.r-project.org (R Core Team, 2019), with which all results were generated

The IMIFA Model Family
Mixtures of Factor Analysers
Mixtures of Infinite Factor Analysers
Illustrative Applications
Benchmark Data
Spectral Metabolomic Data
Handwritten Digit Data
Findings
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.