A Parallel DBSCAN Algorithm Based on Spark

Guangchun Luo,Xiaoyu Luo,Ke Qin,Ling Tian,Thomas Fairley Gooch

doi:10.1109/bdcloud-socialcom-sustaincom.2016.85

Guangchun Luo, Xiaoyu Luo + Show 3 more

https://doi.org/10.1109/bdcloud-socialcom-sustaincom.2016.85

Copy DOI

Abstract

With the explosive growth of data, we have entered the era of big data. In order to sift through masses of information, many data mining algorithms using parallelization are being implemented. Cluster analysis occupies a pivotal position in data mining, and the DBSCAN algorithm is one of the most widely used algorithms for clustering. However, when the existing parallel DBSCAN algorithms create data partitions, the original database is usually divided into several disjoint partitions, with the increase in data dimension, the splitting and consolidation of high-dimensional space will consume a lot of time. To solve the problem, this paper proposes a parallel DBSCAN algorithm (S_DBSCAN) based on Spark, which can quickly realize the partition of the original data and the combination of the clustering results. It is divided into the following steps: 1) partitioning the raw data based on a random sample, 2) computing local DBSCAN algorithms in parallel, 3) merging the data partitions based on the centroid. Compared with the traditional DBSCAN algorithm, the experimental result shows the proposed S_DBSCAN algorithm provides better operating efficiency and scalability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Parallel DBSCAN Algorithm Based on Spark

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Approaches for scaling DBSCAN algorithm to large spatial databases
Aoying Zhou ... Yunfa Hu
Journal of Computer Science and Technology | VOL. 15
Aoying Zhou, et. al.Aoying Zhou ... Yunfa Hu
01 Nov 2000
Journal of Computer Science and Technology | VOL. 15

An improvement method of DBSCAN algorithm on cloud computing
Weipeng Jing ... Chao Jiang
Procedia Computer Science | VOL. 147
Weipeng Jing, et. al.Weipeng Jing ... Chao Jiang
01 Jan 2019
Procedia Computer Science | VOL. 147

DBSCAN-PSM: an improvement method of DBSCAN algorithm on Spark
Weipeng Jing ... Yiqun Cheng
International Journal of High Performance Computing and Networking | VOL. 13
Weipeng Jing, et. al.Weipeng Jing ... Yiqun Cheng
01 Jan 2019
International Journal of High Performance Computing and Networking | VOL. 13

DBSCAN-PSM: an improvement method of DBSCAN algorithm on Spark
Guangsheng Chen ... Yiqun Cheng
International Journal of High Performance Computing and Networking | VOL. 13
Guangsheng Chen, et. al.Guangsheng Chen ... Yiqun Cheng
01 Jan 2019
International Journal of High Performance Computing and Networking | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Parallel DBSCAN Algorithm Based on Spark

Abstract

Talk to us

Similar Papers