Abstract

While a vast amount of clustering algorithms of different types are available in the literature, the majority of existing algorithms depend on carefully tuned parameters to obtain satisfactory results. In this paper, we reduce the dependence on parameters on the basis of the dominant sets algorithm. The dominant sets algorithm is a parameter-independent clustering approach, which uses the pairwise data similarity matrix as input. If the data for clustering are in the form of feature vectors, it is necessary to measure the data similarity and build the similarity matrix. With the commonly used Gaussian kernel, the involved parameter is found to exert a significant influence on the clustering results. We study in depth why and how the dominant sets clustering results are influenced by the parameter and attribute the influence to the dominant set definition, which imposes a somewhat too strict constraint on internal similarity. A two-step clustering algorithm is then proposed to solve this problem. First, we transform the similarity matrix by histogram equalization before clustering, and this is shown to eliminate the influence of similarity parameter effectively. In the second step, we expand the clusters to maximize the ratio of internal similarity with respect to external similarity. Our algorithm is designed to achieve the balance between high internal similarity and low external similarity, thereby relieving the dependence on the similarity parameter. In experiments on ten publicly available data sets, our algorithm is shown to perform well in comparison with several other algorithms which benefit from carefully tuned parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call