Hierarchical Clustering of Massive, High Dimensional Data Sets by Exploiting Ultrametric Embedding

Fionn Murtagh,Pedro Contreras,Geoff Downs

doi:10.1137/060676532

Abstract

Coding of data, usually upstream of data analysis, has crucial implications for the data analysis results. By modifying the data coding—through use of less than full precision in data values—we can aid appreciably the effectiveness and efficiency of the hierarchical clustering. In our first application, this is used to lessen the quantity of data to be hierarchically clustered. The approach is a hybrid one, based on hashing and on the Ward minimum variance agglomerative criterion. In our second application, we derive a hierarchical clustering from relationships between sets of observations, rather than the traditional use of relationships between the observations themselves. This second application uses embedding in a Baire space, or longest common prefix ultrametric space. We compare this second approach, which is of $O(n \log n)$ complexity, to k-means.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hierarchical Clustering of Massive, High Dimensional Data Sets by Exploiting Ultrametric Embedding

Abstract

Talk to us

Similar Papers

More From: SIAM Journal on Scientific Computing

Lead the way for us

Journal: SIAM Journal on Scientific Computing	Publication Date: Jan 1, 2008
Citations: 76

Similar Papers

H-D and Subspace Clustering of Paradoxical High Dimensional Clinical Datasets with Dimension Reduction Techniques – a Model
S Rajeswari ... M S Josephine
Indian Journal of Science and Technology | VOL. 9
S Rajeswari, et. al.S Rajeswari ... M S Josephine
19 Oct 2016
Indian Journal of Science and Technology | VOL. 9

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data
Tahir Mehmood ... Zahid Rasheed
Communications for Statistical Applications and Methods | VOL. 22
Tahir Mehmood, et. al.Tahir Mehmood ... Zahid Rasheed
30 Nov 2015
Communications for Statistical Applications and Methods | VOL. 22

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
Jörg Rahnenführer ... Eugenia Migliavacca
BMC Medicine | VOL. 21
Jörg Rahnenführer, et. al.Jörg Rahnenführer ... Eugenia Migliavacca
15 May 2023
BMC Medicine | VOL. 21

Features Selection in Statistical Classification of High Dimensional Image Derived Maize (<i>Zea Mays</i> L.) Phenomic Data
Peter Gachoki ... Gladys Njoroge
American Journal of Applied Mathematics and Statistics | VOL. 10
Peter Gachoki, et. al.Peter Gachoki ... Gladys Njoroge
07 Jun 2022
American Journal of Applied Mathematics and Statistics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical Clustering of Massive, High Dimensional Data Sets by Exploiting Ultrametric Embedding

Abstract

Talk to us

Similar Papers

More From: SIAM Journal on Scientific Computing