Clustering Categorical Data via Ensembling Dissimilarity Matrices

Saeid Amiri,Bertrand S Clarke,Jennifer L Clarke

doi:10.1080/10618600.2017.1305278

Abstract

ABSTRACTWe present a technique for clustering categorical data by generating many dissimilarity matrices and combining them. We begin by demonstrating our technique on low-dimensional categorical data and comparing it to several other techniques that have been proposed. We show through simulations and examples that our method is both more accurate and more stable. Then we give conditions under which our method should yield good results in general. Our method extends to high-dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context, we compare our method with two other methods. Finally, we extend our method to high-dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give an example to show that our method continues to provide useful results, in particular, providing a comparison with phylogenetic trees. Supplementary material for this article is available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational and Graphical Statistics	Publication Date: Jul 11, 2017
Citations: 29	License type: open-access

R Discovery Prime

R Discovery Prime

Clustering Categorical Data via Ensembling Dissimilarity Matrices

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics

Lead the way for us

Similar Papers

Curve fitting by Spherical Least Squares on two-dimensional sphere
Jun Fujiki ... Shotaro Akaho
-
Jun Fujiki, et. al.Jun Fujiki ... Shotaro Akaho
01 Sep 2009
01 Sep 2009

PHiDJ: Parallel similarity self-join for high-dimensional vector data with MapReduce
Sergej Fries ... Brigitte Boden
-
Sergej Fries, et. al.Sergej Fries ... Brigitte Boden
01 Mar 2014
01 Mar 2014

Mixture Modelling of High-Dimensional Data
Damien Mcparland ... Thomas Brendan Murphy
-
Damien Mcparland, et. al.Damien Mcparland ... Thomas Brendan Murphy
04 Jan 2019
04 Jan 2019

A novel attribute weighting algorithm for clustering high-dimensional categorical data
Liang Bai ... Fuyuan Cao
Pattern Recognition | VOL. 44
Liang Bai, et. al.Liang Bai ... Fuyuan Cao
10 May 2011
Pattern Recognition | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering Categorical Data via Ensembling Dissimilarity Matrices

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics