Model-based clustering with envelopes

Wenjing Wang,Xin Zhang,Qing Mai

doi:10.1214/19-ejs1652

Abstract

Clustering analysis is an important unsupervised learning technique in multivariate statistics and machine learning. In this paper, we propose a set of new mixture models called CLEMM (in short for Clustering with Envelope Mixture Models) that is based on the widely used Gaussian mixture model assumptions and the nascent research area of envelope methodology. Formulated mostly for regression models, envelope methodology aims for simultaneous dimension reduction and efficient parameter estimation, and includes a very recent formulation of envelope discriminant subspace for classification and discriminant analysis. Motivated by the envelope discriminant subspace pursuit in classification, we consider parsimonious probabilistic mixture models where the cluster analysis can be improved by projecting the data onto a latent lower-dimensional subspace. The proposed CLEMM framework and the associated envelope-EM algorithms thus provide foundations for envelope methods in unsupervised and semi-supervised learning problems. Numerical studies on simulated data and two benchmark data sets show significant improvement of our propose methods over the classical methods such as Gaussian mixture models, K-means and hierarchical clustering algorithms. An R package is available at https://github.com/kusakehan/CLEMM.

Highlights

Cluster analysis is a cornerstone of multivariate statistics and unsupervised learning
We focus on the Gaussian mixture models (GMM) because of their popularity and effectiveness in approximating multimodal distributions
To make our envelope-EM algorithm easier to comprehend, we present it in a way that is parallel to the classical EM algorithm for fitting GMM

Summary

Introduction

Cluster analysis (or clustering) is a cornerstone of multivariate statistics and unsupervised learning. Our method is built on the envelope principle that is more flexible and general It assumes that there exists a subspace such that observations projected onto this subspace would share a common structure that is invariant as the underlying cluster varies. The envelope, on the other hand, is a more targeted dimension reduction subspace whose goal is to improve efficiency in Gaussian mixture model parameter estimation and to obtain better clustering result. Under this shared covariance assumption, the envelope-EM algorithm can be even faster than the standard EM estimation in Gaussian mixture models, which does not require subspace estimation.

Notation and definitions

CLEMM: Clustering with Envelope Mixture Models

Envelope in clustering: a latent variable interpretation

Working mechanism of CLEMM

Connections with factor analyzers approaches

A brief review of the EM algorithm for GMM

Envelope-EM algorithm for CLEMM

3: M-step:

Envelope subspace estimation

CLEMM-Shared: a special case of CLEMM

Model selection

Simulations

Real data analysis

Discussion

EM implementation details

Computation time comparison

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2020
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Model-based clustering with envelopes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

A Primer on Machine Learning.
Audrene S Edwards ... Tun Jie
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Tun Jie
18 Aug 2020
Transplantation | VOL. 105

Robust EM algorithm for model-based curve clustering
Faicel Chamroukhi
-
Faicel ChamroukhiFaicel Chamroukhi
01 Aug 2013
01 Aug 2013

Model-based cluster and discriminant analysis with the MIXMOD software
Florent Langrognet ... Gérard Govaert
Computational Statistics and Data Analysis | VOL. 51
Florent Langrognet, et. al.Florent Langrognet ... Gérard Govaert
10 Jan 2006
Computational Statistics and Data Analysis | VOL. 51

Mixture Models for Unsupervised and Supervised Learning

-

01 Jan 1999
01 Jan 1999

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Model-based clustering with envelopes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics