Abstract

Density-based Spatial Clustering of Applications with Noise (called DBSCAN) finds clusters of various shapes and is not affected by noise, so it is widely used in the field of data mining. In Cyber Physical and Social Computing, massive data needs to be analyzed and processed, and the standard linear DBSCAN algorithm is difficult to effectively apply to the processing of massive datasets. In order to employ the DBSCAN algorithm to process massive datasets, this paper proposes a distributed DBSCAN algorithm (called D-DBSCAN). The following works are implemented in the D-DBSCAN algorithm: First, the dataset is stored on multiple storage nodes in a distributed manner to handle massive datasets. Second, a local Eps-neighbors search is performed on the storage nodes to achieve parallel processing of massive data. Finally, the data returned by the storage and compute nodes are merged to obtain a global Eps-neighbors result. In the experimental part, a series of different simulated datasets was used to verify the effectiveness of the D-DBSCAN algorithm. The experimental results show that the D-DBSCAN algorithm based on distributed storage and computation effectively improves the ability and scalability of the DBSCAN algorithm to process massive data. The D-DBSCAN algorithm can be easily deployed in cloud-edge computing environments to handle massive amounts of data in Cyber Physical and Social Computing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call