A methodology for dealing with spatial big data

Gabriella Schoier,Giuseppe Borruso

doi:10.1504/ijbidm.2017.082705

Abstract

Spatial data mining (SDM) refers to the mining of knowledge from spatial data. Recently, location-based services have enabled the gathering of a significant amount of geo-referenced data, i.e., of spatial big data (SBD). Spatial datasets often exceed the ability of current computing systems to manage these data with reasonable effort; therefore, data-intensive computing and data mining techniques are useful tools for conducting an analysis. In this paper, we present an approach to the clustering of high-dimensional data that allows a flexible approach to the statistical modelling of phenomena characterised by unobserved heterogeneity. Numerous clustering algorithms have been developed for large databases; density-based algorithms particularly treat a huge amount of data in large spatial databases. We present the Modified Density-Based Spatial Clustering of Applications with Noise (MDBSCAN) algorithm and compare it to the classical k-means approach. Both applications use synthetic datasets and a dataset of satellite images.

Full Text