A two-level storage strategy for map-reduce enabled computation of local map algebra

Jianbo Zhang,Caikun Chen,Hao Xia,Simin Zhou,Tingnan Liang,Yongchang Li

doi:10.1007/s12145-020-00452-x

Abstract

In the big data era, high-resolution raster-based geocomputation has been widely employed in geospatial studies. The algorithms used in local map algebra operations are data-intensive and require a large memory space and massive computing power. Simply employing distributed computing framework such as Hadoop to serve such applications incurs storage and performance issues. In this paper, we present a two-level storage strategy specially for map-reduce implementation of local map algebra algorithms under Hadoop. This approach implements efficient storage and manipulation of large raster data sets through three processes: (1) partitioning a raster file into square tile sets, (2) compressing and reorganizing these tile sets to prevent tile overlap across data divisions, and (3) improving MapReduce’s I/O interfaces for data exchange of parallel computation of map algebra. Experiments with real-world datasets show that the proposed strategy can achieve high speedup and efficiency for raster-based spatial analysis applications. The results also show that the strategy has satisfactory scalability as the number of data nodes in clusters or the raster data volume is increased.

Full Text