An experimental comparison of model-based clustering methods

Marina Meilă,David Heckerman

doi:10.1023/a:1007648401407

Abstract

We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation-Maximization (EM) algorithm, a winner take all version of the EM algorithm reminiscent of the K-means algorithm, and model-based hierarchical agglomerative clustering. We learn naive-Bayes models with a hidden root node, using high-dimensional discrete-variable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative clustering. Although the methods are substantially different, they lead to learned models that are strikingly similar in quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An experimental comparison of model-based clustering methods

Abstract

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Journal: Machine Learning	Publication Date: Feb 1, 1998
Citations: 262

Similar Papers

Comparisons of Two Methods for Haplotype Reconstruction and Haplotype Frequency Estimation from Population Data
Shuanglin Zhang ... Hongyu Zhao
The American Journal of Human Genetics | VOL. 69
Shuanglin Zhang, et. al.Shuanglin Zhang ... Hongyu Zhao
01 Oct 2001
The American Journal of Human Genetics | VOL. 69

Robust EM algorithm for model-based curve clustering
Faicel Chamroukhi
-
Faicel ChamroukhiFaicel Chamroukhi
01 Aug 2013
01 Aug 2013

Normalized EM algorithm for tumor clustering using gene expression data
Nguyen Minh Phuong ... Nguyen Xuan Vinh
-
Nguyen Minh Phuong, et. al.Nguyen Minh Phuong ... Nguyen Xuan Vinh
01 Oct 2008
01 Oct 2008

What are Clusters in High Dimensions and are they Difficult to Find?
Frank Klawonn ... Frank Höppner
-
Frank Klawonn, et. al.Frank Klawonn ... Frank Höppner
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An experimental comparison of model-based clustering methods

Abstract

Talk to us

Similar Papers

More From: Machine Learning