Abstract

Internal validity indices are crucial in evaluating the quality of clustering results, serving as valuable tools for comparing various clustering algorithms and determining the optimal number of clusters for datasets. Most existing internal validity indices use the worst-case scenario to represent the overall validity. Moreover, some indices assign equal weights to distances among different clusters, even when these distances might have varying degrees of influence on the overall validity. Data envelopment analysis (DEA) is an effective technique for evaluating the performance of decision-making units through the computation of the ratio of the weighted sum of outputs to the weighted sum of inputs. The weight assigned to each indicator signifies its degree of influence on efficiency. Furthermore, DEA can be viewed as a multiple-criteria evaluation methodology, wherein inputs and outputs are two sets of performance criteria. We propose a DEA-based internal validity index (DEAI) to evaluate the validity of the clustering results. In this approach, intra-cluster compactness and inter-cluster separation are employed for determining the input(s) and output(s). The DEAI is then applied to the artificial datasets and empirical examples. Experimental results illustrate that DEAI outperforms six classic internal validity indices in accurately identifying the optimal cluster across all 10 datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call