Abstract

Clustering validation and identifying the optimal number of clusters are crucial in expert and intelligent systems. However, the commonly used cluster validity indices (CVI) are not relevant enough to measure data structures. They do not embed the necessary mechanisms to be as effective as that of the clustering algorithm used to give the clustering results. This paper proposes a novel CVI called PDBI (Partitioning Davies-Bouldin Index) initially inspired from the native idea of the Davies-Bouldin Index (DBI). PDBI is based on a strategy that consists in dividing each cluster into sub-clusters that redefine the concepts of internal homogeneity and cluster separation via the integration of sophisticated mechanisms. This strategy makes it possible to process a relevant CVI even in the case of complex data structures and in presence of clusters with noisy patterns. PDBI is deterministic, runs independently of a given clustering algorithm and generates a normalized score between 0 and 1. Numerous tests were carried out using 2-dimensional benchmark data sets and data generated in higher dimensions with consistent ground truths. The experimental comparisons with the state-of-the-art validity indices demonstrate the efficiency of the proposal in discovering the true number of clusters and dealing with various sorts of data sets. The PDBI demonstration as well as illustrations can be found on the author’s website11A demonstration is available at: http://r-riad.net/

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call