Information Theoretic Hierarchical Clustering

Mehdi Aghagolzadeh,Babak Nadjar Araabi,Hamid Soltanian-Zadeh

doi:10.3390/e13020450

Abstract

Hierarchical clustering has been extensively used in practice, where clusters can be assigned and analyzed simultaneously, especially when estimating the number of clusters is challenging. However, due to the conventional proximity measures recruited in these algorithms, they are only capable of detecting mass-shape clusters and encounter problems in identifying complex data structures. Here, we introduce two bottom-up hierarchical approaches that exploit an information theoretic proximity measure to explore the nonlinear boundaries between clusters and extract data structures further than the second order statistics. Experimental results on both artificial and real datasets demonstrate the superiority of the proposed algorithm compared to conventional and information theoretic clustering algorithms reported in the literature, especially in detecting the true number of clusters.

Highlights

Clustering is an unsupervised approach for segregating data into its natural groups, such that the samples in each group have the highest similarity with each other and the highest dissimilarity with samples of the other groups
By implementing the split and merge clustering, we demonstrate the mean quadratic mutual information estimated at each hierarchy in Figures 4b, 4d, and 4f, in which the errorbar shows the standard deviation for repeating the clustering 10-fold, each originating from a different initial clustering
Two hierarchical approaches are proposed for maximizing the quadratic mutual information between the samples of the input space and the clusters, namely the agglomerative and the split and merge clustering

Summary

Introduction

Clustering is an unsupervised approach for segregating data into its natural groups, such that the samples in each group have the highest similarity with each other and the highest dissimilarity with samples of the other groups. The distribution is estimated using a Parzen window estimator with Gaussian kernels centered on each sample and with a constant covariance This distribution seems superficial and computationally expensive, but exploiting the Rényi’s entropy estimator [13] in a quadratic form as the proximity measure, the mutual information can be estimated from pairwise distances, . Referred to as the quadratic mutual information [14] This proximity measure has been used in an iterative clustering to optimize the clustering evaluation function that will find the nonlinear boundaries between clusters [15]. We propose two algorithms for the hierarchical optimization, the agglomerative and the split-and-merge clustering In the former, at any hierarchy, the two clusters that maximize the mutual information are combined into one cluster.

Distortion-Rate Theory

Quadratic Mutual Information

Parzen Window Estimator with Gaussian Kernels

Hierarchical Optimization

Agglomerative Clustering

Split and Merge Clustering

Experimental

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Feb 10, 2011
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Information Theoretic Hierarchical Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Distributed Information Theoretic Clustering
Pengcheng Shen ... Chunguang Li
IEEE Transactions on Signal Processing | VOL. 62
Pengcheng Shen, et. al.Pengcheng Shen ... Chunguang Li
01 Jul 2014
IEEE Transactions on Signal Processing | VOL. 62

A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters
Ahmed Adnan Alsudani ... Mohammed Alswaitti
Expert Systems With Applications | VOL. 191
Ahmed Adnan Alsudani, et. al.Ahmed Adnan Alsudani ... Mohammed Alswaitti
05 Dec 2021
Expert Systems With Applications | VOL. 191

DESTEK VEKTÖR ÖBEKLEME İÇİN ETKİLİ KERNEL FONKSİYONLARININ ARAŞTIRILMASI
Furkan Burak Bağci ... Ömer Karal
Mugla Journal of Science and Technology | VOL. 6
Furkan Burak Bağci, et. al.Furkan Burak Bağci ... Ömer Karal
31 Dec 2020
Mugla Journal of Science and Technology | VOL. 6

LEGClust—A Clustering Algorithm Based on Layered Entropic Subgraphs
J.M Santos ... L.A Alexandre
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 30
J.M Santos, et. al.J.M Santos ... L.A Alexandre
01 Jan 2008
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information Theoretic Hierarchical Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy