Abstract

Clustering used to discover hidden patterns in unlabeled data sets is an important task in data mining. Therefore, clustering validation applied to evaluating the clustering results has been recognized as one of the vital issues for clustering applications. The existing validation index is often used to evaluate the results rather than guide the clustering process dynamically. However, using the index can automatically adjust the algorithm’s operation according to the actual data distribution, thereby improving the algorithm’s adaptability. Simulating the division and aggregation process of cells in biology, in this paper, we propose a new two-phase (grouping and merging) cell group clustering algorithm by using a continuous validation and correction mechanism. In the grouping phase, a new clustering internal validation index called Split Index (SI) is utilized to evaluate the cohesion of a cell group continuously, and then a validation and correction mechanism is adopted to validate and split the cell groups so that the SI of each cell group can meet the split threshold ε, finally the cell nucleus of each cell group is determined by finding the sum of the minimum distances from the nucleus to other samples. In the merging phase, the cell group merge method is adopted to merge all the reachable cell groups in a density-reachable manner. Ultimately the clustering problem of arbitrarily distributed samples is completed. Experiments on the synthetic and the UCI Machine Learning Repository11http://archive.ics.uci.edu/ml/index.php. data set show that the validation index can effectively guide the clustering process, and the algorithm can deal with various data sets, including imbalanced data sets and spherical non-spherical clusters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call