English

Luis F Lago-Fernández ,Antonio González ,Gonzalo Martínez-Muñoz ,Manuel Sánchez-Montañés

doi:10.5220/0003793602350241

Abstract

The aim of a crisp cluster validity index is to quantify the quality of a given data partition. It allows to select the best partition out of a set of potential ones, and to determine the number of clusters. Recently, negentropy-based cluster validation has been introduced. This new approach seems to perform better than other state of the art techniques, and its computation is quite simple. However, like many other cluster validation approaches, it presents problems when some partition regions have a small number of points. Different heuristics have been proposed to cope with this problem. In this article we systematically analyze the performance of different negentropy-based validation approaches, including a new heuristic, in clustering problems of increasing dimensionality, and compare them to reference criteria such as AIC and BIC. Our results on synthetic data suggest that the newly proposed negentropy-based validation strategy can outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.

Full Text