Research on large data set clustering method based on MapReduce

Pengcheng Wei,Chuanfu Shang,Fangcheng He,Li Li,Jing Li

doi:10.1007/s00521-018-3780-y

Research on large data set clustering method based on MapReduce

Pengcheng Wei, Chuanfu Shang + Show 3 more

https://doi.org/10.1007/s00521-018-3780-y

Copy DOI

Journal: Neural Computing & Applications	Publication Date: Oct 6, 2018
Citations: 7

Affiliation: Chongqing University of Education

#Analysis Of Large Data Sets #Canopy Partitioning + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The similarities and differences between the K-means algorithm and the Canopy algorithm’s MapReduce implementation are described in detail, and the possibility of combining the two to design a better algorithm suitable for clustering analysis of large data sets is analyzed in this paper. Different from the previous literature’s improvement ideas for K-means algorithm, it proposes new ideas for sampling and analyzes the selection of relevant thresholds in this paper. Finally, it introduces the MapReduce implementation framework based on Canopy partitioning and filtering K-means algorithm and analyzes some pseudocode in this chapter. Finally, it briefly analyzes the time complexity of the algorithm in this paper.

Full Text