Hierarchical Model-Based Clustering for Large Datasets

Christian Posse

doi:10.1198/106186001317115072

Abstract

In recent years, hierarchical model-based clustering has provided promising results in a variety of applications. However, its use with large datasets has been hindered by a time and memory complexity that are at least quadratic in the number of observations. To overcome this difficulty, this article proposes to start the hierarchical agglomeration from an efficient classification of the data in many classes rather than from the usual set of singleton clusters. This initial partition is derived from a subgraph of the minimum spanning tree associated with the data. To this end, we develop graphical tools that assess the presence of clusters in the data and uncover observations difficult to classify. We use this approach to analyze two large, real datasets: a multiband MRI image of the human brain and data on global precipitation climatology. We use the real datasets to discuss ways of integrating the spatial information in the clustering analysis. We focus on two-stage methods, in which a second stage of processing using established methods is applied to the output from the algorithm presented in this article, viewed as a first stage.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hierarchical Model-Based Clustering for Large Datasets

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics

Lead the way for us

Journal: Journal of Computational and Graphical Statistics	Publication Date: Sep 1, 2001
Citations: 50

Similar Papers

Spectrum enhancement using linear programming
R Rothacker ... S Davidovici
-
R Rothacker, et. al.R Rothacker ... S Davidovici
01 Dec 1986
01 Dec 1986

Detection of applicative distortions in digital images based on a combination of a neural network and a statistically optimal algorithm
E.A Samoylin ... K.E Skugorov
Neurocomputers | VOL. -
E.A Samoylin, et. al.E.A Samoylin ... K.E Skugorov
01 Jan 2020
Neurocomputers | VOL. -

First class classification [mailroom automation
-
Engineering & Technology | VOL. 3
--
23 Feb 2008
Engineering & Technology | VOL. 3

Anger management interventions.
John E. Lochman ... Heather K. McElroy
Journal of Early and Intensive Behavior Intervention | VOL. 1
John E. Lochman, et. al.John E. Lochman ... Heather K. McElroy
01 Jan 2004
Journal of Early and Intensive Behavior Intervention | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical Model-Based Clustering for Large Datasets

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics