Abstract

In view of the fact that DBSCAN clustering algorithm can identify the data with arbitrary shape and one-pass clustering algorithm has the quick and efficient feature, this paper proposes a two-stage hybrid clustering algorithm. DBSCAN is improved to process the data with categorical attributes. By combining one-pass clustering algorithm with DBSCAN clustering algorithm, a two-stage hybrid clustering algorithm is presented. In the first stage, one-pass clustering algorithm is used to group the data (we call it the original partition). In the second stage, we merge that partition with improved DBSCAN clustering algorithm so that the final clusters are obtained. The presented clustering algorithm is of nearly linear time complexity, which can be used to process large-scale datasets. The experimental results on real datasets and synthetic datasets show that the two-stage hybrid clustering algorithm can help identify the data with arbitrary shape similar to DBSCAN, the operating efficiency of which is not only superior to DBSCAN, but also effective and practicable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call