Parallel co-location mining with MapReduce and NoSQL systems

Jin Soung Yoo,David Kimmey,Douglas Boulware

doi:10.1007/s10115-019-01381-y

Abstract

With the rapid growth of georeferenced data, large-scale data processing and analysis methods are needed for spatial big data. Spatial co-location pattern mining is an interesting and important issue in spatial data mining area which discovers the subsets of features whose objects are frequently located together in geographic proximity. There are several works for efficiently processing co-location pattern discovery; however, they may be insufficient for large dense spatial data because the mining task takes up a lot of processing time and memory. In this work, we leveraged the power of a modern distributed computing platform, Hadoop, and developed an algorithm (called ParColoc) for parallel co-location mining on the MapReduce framework. This study explored challenge issues in designing the parallel co-location mining algorithm and solved them with adopting a spatial declusteirng technique and a NoSQL system. We conducted an experimental evaluation with real-world data and synthetic data to examine the effectiveness of proposed methods. The experiment result shows that ParColoc is a promising method for parallel co-location mining in cloud computing environment.

Full Text