Abstract

In clustering problems, the correct number of clusters is usually unknown. Thus, all clusters produced by clustering algorithms should be evaluated, and the correct number of clusters is then selected based on a cluster validity index (CVI). CVI has been popularly used to evaluate the fitness of partitions by clustering algorithms. CVI is measured by the summation or ratio of compactness and separation measures, in which compactness refers to the concentration of data in each cluster, and separation refers to the intra-cluster distances. However, the existing CVIs are sensitive to clusters with arbitrary shapes, especially for high-dimensional data. In addition, the values of existing validity indices can be heavily influenced by noise and outliers. Therefore, in this paper, we propose a support vector data description (SVDD)-based CVI, in which the compactness of a cluster is measured in the kernel space, in an attempt to overcome the sensitivity of a compactness measure for arbitrary shapes, sub-clusters, and noise in data. The proposed CVI is evaluated through a series of experiments and shows that the SVDD variants cannot successfully determine the correct cluster number for most cases, but they can for cases with heavy overlapping and outliers, especially for arbitrary and overlapping shapes, compared to non-SVDD approaches on most datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call