Order Selection in Unsupervised Learning and Clustering for Arbitrary and Non-Arbitrary Shaped Data

Mahdi Shahbaba

doi:10.32920/ryerson.14668125.v1

Abstract

<p>This thesis focuses on clustering for the purpose of unsupervised learning. One topic of our interest is on estimating the correct number of clusters (CNC). In conventional clustering approaches, such as X-means, G-means, PG-means and Dip-means, estimating the CNC is a preprocessing step prior to finding the centers and clusters. In another word, the first step estimates the CNC and the second step finds the clusters. Each step having different objective function to minimize. Here, we propose minimum averaged central error (MACE)-means clustering and use one objective function to simultaneously estimate the CNC and provide the cluster centers. We have shown superiority of MACEmeans over the conventional methods in term of estimating the CNC with comparable complexity. In addition, on average MACE-means results in better values for adjusted rand index (ARI) and variation of information (VI). Next topic of our interest is order selection step of the conventional methods which is usually a statistical testing method such as Kolmogrov-Smrinov test, Anderson-Darling test, and Hartigan's Dip test. We propose a new statistical test denoted by Sigtest (signature testing). The conventional statistical testing approaches rely on a particular assumption on the probability distribution of each cluster. Sigtest on the other hand can be used with any prior distribution assumption on the clusters. By replacing the statistical testing of the mentioned conventional approaches with Sigtest, we have shown that the clustering methods are improved in terms of having more accurate CNC as well as ARI and VI. Conventional clustering approaches fail in arbitrary shaped clustering. Our last contribution of the thesis is in arbitrary shaped clustering. The proposed method denoted by minimum Pathways is Arbitrary Shaped (minPAS) clustering is proposed based on a unique minimum spanning tree structure of the data. Our simulation results show advantage of minPAS over the state-of-the-art arbitrary shaped clustering methods such as DBSCAN and Affinity Propagation in terms of accuracy, ARI and VI indexes.</p>

Highlights

Clustering has wide range of applications in different disciplines of science and engineering such as bioinformatics, genetics, image segmentation [1], voice recognition, document classification and weather classification [2–4]
Note that minimum averaged central error (MACE)-means dependency on the assumption of having the same variance in clusters is a disadvantage of the method which should be addressed in the future work
Be used with other clustering methods. Another potential future work will be extending the MACE fundamentals to use with clustering methods with wider range of assumptions beyond the spherical Gaussian

Summary

Introduction

Clustering has wide range of applications in different disciplines of science and engineering such as bioinformatics, genetics, image segmentation [1], voice recognition, document classification and weather classification [2–4]. The goal of a clustering algorithm is to subjectively group observed data samples based on their similarity and dissimilarity [14]. In this Chapter, we briefly discuss some of the widely used clustering methods and their requirements. A more recently proposed method for the purpose of statistical testing in clustering is Hartigan’s Dip test This method generalizes the Gaussian assumption of the two above methods to a unimodal distribution. In general spectral clustering methods, K largest eigenvectors of the Laplacian of the affinity matrix will be used for partitioning data.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Order Selection in Unsupervised Learning and Clustering for Arbitrary and Non-Arbitrary Shaped Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Order Selection in Unsupervised Learning and Clustering for Arbitrary and Non-Arbitrary Shaped Data
Mahdi Shahbaba
-
Mahdi ShahbabaMahdi Shahbaba
24 May 2021
24 May 2021

Clustering by Detecting Density Peaks and Assigning Points by Similarity-First Search Based on Weighted K-Nearest Neighbors Graph
Qi Diao ... Weixing Li
Complexity | VOL. 2020
Qi Diao, et. al.Qi Diao ... Weixing Li
12 Aug 2020
Complexity | VOL. 2020

Blood Vessel Segmentation of Exudates Detection in Retinal Fundus Image
Kandavalli Michael Angelo ... A Sivagami
-
Kandavalli Michael Angelo, et. al.Kandavalli Michael Angelo ... A Sivagami
06 May 2021
06 May 2021

Transforming post-mining area into expressway site by stability evaluation with clustering method: A case study
Song Guo ... Xiangsheng Yang
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects | VOL. ahead-of-print
Song Guo, et. al.Song Guo ... Xiangsheng Yang
16 May 2021
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Order Selection in Unsupervised Learning and Clustering for Arbitrary and Non-Arbitrary Shaped Data

Abstract

Highlights

Summary

Talk to us

Similar Papers