Abstract

As a classical data analysis technique, clustering plays the important role in identifying natural structures of target datasets. However, many of the existing clustering methods, including clustering algorithms and clustering validity indexes (CVIs), are still suffering from problems of low efficiency, poor clustering accuracy, poor stability and more sensitivity to noise points. In this paper, by mapping datasets to grids, the Grid-K-means algorithm is firstly proposed to overcome drawbacks of the traditional K-means algorithm. Then, by utilizing grid points as the weighted representative points to process datasets, a new clustering validity index (BCVI) is designed to better evaluate the quality of clustering results generated by the Grid-K-means algorithm. Based on the monotonous feature of BCVI and the linear combination of intra-cluster compactness and inter-cluster separation of clusters, BCVI consumes much lower time cost in finding the optimal clustering number (Kopt) than the commonly used method that utilizes the empirical rule Kmax≤n to calculate the Kopt. Experimental results on testing many types of datasets have demonstrated that the Grid-K-means algorithm is faster and more accurate than the traditional ones. Meanwhile, the experimental results on testing BCVI and seven existing CVIs have shown that the new BCVI is superior to the traditional ones in terms of clustering stability and data processing speed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call