DBSCAN is one of the efficient density-based clustering algorithms. It is characterized by its ability to discover clusters with different shapes and sizes, and to separate noise and outliers. However, when the dataset contain different densities, DBSCAN clustering will be inefficient. In this paper, we propose an approach to enable DBSCAN to cluster dataset having different densities by preprocess the dataset to make it with one density level. This system composed of four stages: firstly, a new approach to separate dataset based on density is presented. Secondly, a new density biased sampling technique is proposed. Thirdly, the resulted sparse data from the last two stages is clustered with DBSCAN. Finally, the remaining data from sampling will be clustered with KNN. The experimental results on synthetic and real datasets on average show that the clustering of the proposed algorithm is better than that of DBSCAN by more than 7% and retains time complexity of DBSCAN
Read full abstract