Abstract

Clustering validity indices are typically used as tools to find the correct number of clusters in a data set and/or to evaluate the quality of the clusters formed by clustering algorithms. Clustering validity indices measure separation and compactness of clusters. Typically, when applying a clustering algorithm, the input includes the number of clusters. After applying the algorithm with several different numbers of clusters, we determine the number of clusters to be the one with the best validity index. There are two types of clustering validity indices: external indices that are supervised, and internal indices that are unsupervised. The focus of this paper is on internal validity indices. Some existing internal validity indices capture the properties of the clusters by using representative statistics such as mean, variance, diameter, etc., however, these do not perform well when clusters have arbitrary shapes. One approach to overcome this issue is to use the density of the data objects in each cluster. That provides the advantage of capturing the full characteristics of the cluster which is most beneficial when there are clusters with arbitrary shapes. In the literature, a few density-based clustering validity indices have been proposed. However, some of them show poor performance when the clusters are not perfectly separated. Some others perform poorly because they use only representative objects from each cluster instead of all objects. The contribution of this paper is an internal validity index named the object-based clustering validity index with densities (OCVD). OCVD is a single number that averages the density-based contribution of individual data objects to both separation and compactness of clusters. The methodology behind calculating the density-based contributions of the objects is kernel density estimation. We show through several experiments that OCVD performs well in detecting the correct number of clusters in data sets with different cluster shapes including arbitrary shapes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call