Abstract

Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, more researchers try to design efficient parallel clustering algorithms. In this paper, we propose a parallel DBSCAN clustering algorithm based on Hadoop, which is a simple yet powerful parallel programming platform. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call