Abstract

Density peak (DP) and density-based spatial clustering of applications with noise (DBSCAN) are the representative clustering algorithms on the basis of density in unsupervised learning. They are capable of clustering data of arbitrary shape as well as identifying noise samples in a potential data set. Notwithstanding, DP algorithm depends on the decision graph when selecting the centers, it is difficult for users without priori knowledge to automatically as well as accurately identify cluster centers. The clustering performance exhibited by DBSCAN algorithm presents a strong sensitivity to parameter setting regarding Eps and MinPts. For dealing with afore-mentioned issues, we propose a new two-stage clustering method based on improved DBSCAN and DP algorithm (TSCM), which first use an improved DBSCAN algorithm based on bat optimization to generate initial clusters. Specifically, the improved DBSCAN takes a well-known internal clustering validation index without labels called Silhouette as fitness function to control the process of parameters determination by bat optimization. The cluster centers in decision graph are automatically selected according to the initial clusters. The final clusters are obtained by DP with the determined cluster centers. As found in the experiments, relative to DP and DBSCAN, TSCM can effectively overcome the manual intervention of cluster center selection in DP and parameters setting in DBSCAN. The clustering performance is significantly improved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call