A Data Stream Clustering Algorithm Based on Density and Extended Grid

Zheng Hua,Shouning Qu,Guodong Mou,Tao Du

doi:10.1007/978-3-319-63312-1_61

Zheng Hua, Shouning Qu + Show 2 more

https://doi.org/10.1007/978-3-319-63312-1_61

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2017

Citations: 2

Affiliation: University of Jinan

Abstract
Full-Text
Similar Papers

Abstract

Listen

Based on the traditional grid density clustering algorithm, proposing A Data Stream Clustering Algorithm Based on Density and Extended Grid(DEGDS). The algorithm combines the advantages of grid clustering algorithm and density clustering algorithm, by improving the defects of clustering parameters by artificially set, get any shape of the cluster. The algorithm uses the local density of each sample point and the distance from the other sample points, determining the number of clustering centers in the grid, and realizing the automatic determination of the clustering center, which avoids the influence of improper selection of initial centroid on clustering results. And in the process of combining the Spark parallel framework for partitioning the data to achieve its parallelization. For data points clustered outside the grid, the clustering within the grid has been effectively expanded by extending the grid, to ensure the accuracy of clustering. Introduced density estimation is connected and grid boundaries to merging grid, saving memory consumption. Using the attenuation factor to incremental update grid density, reflect the evolution of spatial data stream. The experimental results show that compared with the traditional clustering algorithm, the DEGDS algorithm has a large performance improvement in accuracy and efficiency, and can be effectively for large data clustering.

Full Text