Circular Silhouette and a Fast Algorithm.

Yinong Chen,Tathagata Debnath,Mingzhou Song,Andrew Cai

doi:10.1109/tpami.2023.3310495

Abstract

Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Circular Silhouette and a Fast Algorithm.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jan 1, 2023
Citations: 1

Similar Papers

How many clusters exist? Answer via maximum clustering similarity implemented in R
Ahmed N Albatineh ... Magdalena Niewiadomska-Bugaj
Biostatistics & Epidemiology | VOL. 3
Ahmed N Albatineh, et. al.Ahmed N Albatineh ... Magdalena Niewiadomska-Bugaj
01 Jan 2019
Biostatistics & Epidemiology | VOL. 3

Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters
Christian Hennig ... Chien-Ju Lin
Statistics and Computing | VOL. 25
Christian Hennig, et. al.Christian Hennig ... Chien-Ju Lin
11 Jun 2015
Statistics and Computing | VOL. 25

A Learning-Based EM Clustering for Circular Data with Unknown Number of Clusters
Shou-Jen Chang-Chien ... Miin-Shen Yang
Proceedings of Engineering and Technology Innovation | VOL. 15
Shou-Jen Chang-Chien, et. al.Shou-Jen Chang-Chien ... Miin-Shen Yang
27 Apr 2020
Proceedings of Engineering and Technology Innovation | VOL. 15

Fast Optimal Circular Clustering and Applications on Round Genomes.
Tathagata Debnath ... Mingzhou Song
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 18
Tathagata Debnath, et. al.Tathagata Debnath ... Mingzhou Song
04 May 2021
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Circular Silhouette and a Fast Algorithm.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence