Abstract

The evaluation of clustering results plays an important role in clustering analysis. However, the existing validity indices are limited to a specific clustering algorithm, clustering parameter, and assumption in practice. In this paper, we propose a novel validity index to solve the above problems based on two complementary measures: boundary points matching and interior points connectivity. Firstly, when any clustering algorithm is performed on a dataset, we extract all boundary points for the dataset and its partitioned clusters using a nonparametric metric. The measure of boundary points matching is computed. Secondly, the interior points connectivity of both the dataset and all the partitioned clusters are measured. The proposed validity index can evaluate different clustering results on the dataset obtained from different clustering algorithms, which cannot be evaluated by the existing validity indices at all. Experimental results demonstrate that the proposed validity index can evaluate clustering results obtained by using an arbitrary clustering algorithm and find the optimal clustering parameters.

Highlights

  • Clustering analysis is an unsupervised technique that can be used for finding the structure in a dataset [1,2,3]

  • The idea has been proposed to deal with the clustering evaluation under the condition of big data [14,15], while very little work is available in the literature that discusses validity indices for big data

  • In views of different characteristics of the investigated datasets, the clustering results are obtained using C-means, density peak-point-based clustering (DPC), and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithms, respectively, where the number of neighbors m in the experiments is fixed at the integer part of 2Dπ

Read more

Summary

Introduction

Clustering analysis is an unsupervised technique that can be used for finding the structure in a dataset [1,2,3]. A large number of validity indices have been proposed to evaluate the clustering results and to determine the optimal number of clusters, which is an essential character of a dataset. A novel two-step cluster enumeration algorithm has been proposed by combining the cluster analysis problem This new BIC method contains information about the dataset in both data-fidelity and penalty terms. Compared with the existing BIC-based cluster enumeration algorithms, the penalty term of the proposed criterion involves information about the actual number of clusters. After the measurement of boundary points matching and interior points connectivity between the entire dataset and all partitioned clusters, a novel validity index is proposed. I.e., C-means, DBSCAN, and DPC, are applied to evaluate the generality of the novel validity index. Two groups of artificial and CT datasets with different characteristics validate the correctness and generalization of the proposed validity index

Typical Clustering Algorithms
C-means
DPC Algorithm
Typical Cluster Validity Index
Materials and Methods
Boundary Matching and Connectivity
Comparison
Evaluation Based
Results and Discussion
Tests on Synthetic Datasets
Relationship between CVIBI and the Number of Clusters
Relationship between CVIBI and ε in DBSCAN
Evaluation under under Various
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.