Dynamic Analytics for Spatial Data with an Incremental Clustering Approach

Fernando Mendes,Joao Moura-Pires,Maribel Yasmina Santos

doi:10.1109/icdmw.2013.169

Abstract

Several clustering algorithms have been extensively used to analyze vast amounts of spatial data. One of these algorithms is the SNN (Shared Nearest Neighbor), a density-based algorithm, which has several advantages when analyzing this type of data due to its ability of identifying clusters of different shapes, sizes and densities, as well as the capability to deal with noise. Having into account that data are usually progressively collected as time passes, incremental clustering approaches are required when there is the need to update the clustering results as new data become available. This paper proposes SNN <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">++</sup> , an incremental clustering algorithm based on the SNN. Its performance and the quality of the resulting clusters are compared with the SNN and the results show that the SNN <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">++</sup> yields the same result as the SNN and show that the incremental feature was added to the SNN without any computational penalty. Moreover, the experimental results also show that processing huge amounts of data using increments considerably decreases the number of distances that need to be computed to identify the points' nearest neighbors.

Full Text