An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

Xin Lu,Xun Wang,Jiao Yuan,Huanghuang Lu

doi:10.1088/1742-6596/1616/1/012065

Abstract

Traditional K-means distributed clustering algorithm has many problems in clustering big data, such as unstable clustering results, poor clustering results and low execution efficiency. In this paper, a density based initial clustering center selection method is proposed to improve the K-means distributed clustering algorithm. The algorithm uses the sample density, the distance between clusters and the cluster compact density, defines the product of the three as the difference weight density, and finds the sample point with the maximum difference weight density as the initial cluster center, so as to solve the problem of randomness and low quality of initial cluster center selection. At the same time, this paper uses spark parallel computing framework to implement the improved algorithm to further improve the processing performance of the algorithm in big data clustering.The experimental results show that the improved k-means distributed clustering algorithm based on spark parallel computing framework has higher execution efficiency, accuracy and good stability in big data clustering analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Aug 1, 2020
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Initialization for K-means Clustering using Voronoi Diagram
Damodar Reddy ... Prasanta K Jana
Procedia Technology | VOL. 4
Damodar Reddy, et. al.Damodar Reddy ... Prasanta K Jana
01 Jan 2012
Procedia Technology | VOL. 4

Density Peak Clustering Algorithm Based on High Density Connection with Entropy Optimization
Weiguo Yi ... Siwei Ma
-
Weiguo Yi, et. al.Weiguo Yi ... Siwei Ma
22 Jul 2022
22 Jul 2022

Analysis of Artistic Modeling of Opera Stage Clothing Based on Big Data Clustering Algorithm
Weiwei Luo
Security and Communication Networks | VOL. 2021
Weiwei LuoWeiwei Luo
29 Dec 2021
Security and Communication Networks | VOL. 2021

Optimization of K-medoids Algorithm for Initial Clustering Center
Wang Yan E ... Liang Yan
Journal of Physics: Conference Series | VOL. 1487
Wang Yan E, et. al.Wang Yan E ... Liang Yan
01 Mar 2020
Journal of Physics: Conference Series | VOL. 1487

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series