An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop.

Zhipeng Liu,Manxing Shi,Yabo Zhao,Dong Liang,Xiuguo Liu,Weihua Hua

doi:10.3390/s21238132

Abstract

Geospatial three-dimensional (3D) raster data have been widely used for simple representations and analysis, such as geological models, spatio-temporal satellite data, hyperspectral images, and climate data. With the increasing requirements of resolution and accuracy, the amount of geospatial 3D raster data has grown exponentially. In recent years, the processing of large raster data using Hadoop has gained popularity. However, data uploaded to Hadoop are randomly distributed onto datanodes without consideration of the spatial characteristics. As a result, the direct processing of geospatial 3D raster data produces a massive network data exchange among the datanodes and degrades the performance of the cluster. To address this problem, we propose an efficient group-based replica placement policy for large-scale geospatial 3D raster data, aiming to optimize the locations of the replicas in the cluster to reduce the network overhead. An overlapped group scheme was designed for three replicas of each file. The data in each group were placed in the same datanode, and different colocation patterns for three replicas were implemented to further reduce the communication between groups. The experimental results show that our approach significantly reduces the network overhead during data acquisition for 3D raster data in the Hadoop cluster, and maintains the Hadoop replica placement requirements.

Highlights

Three-dimensional raster data have long been used to model continuous 3D spatial objects due to their simple representation and analysis [1,2]
We propose an efficient static replica placement policy of Hadoop Distributed File System (HDFS) optimized for large-scale geospatial 3D raster data, mainly focusing on the problem of a large network overhead and load balancing in the analysis of an entire region
The IO efficiency of our method was compared with the colocation-based replica placement policy extended from CoS-HDFS [31]

Summary

Introduction

Three-dimensional raster data have long been used to model continuous 3D spatial objects due to their simple representation and analysis [1,2]. It is difficult to analyze the increasing volume of geospatial 3D raster data under the traditional management and processing architecture. Processing large-scale geospatial data in a distributed computing environment is becoming common practice [5,6]. Hadoop [7], an open-source big data framework applied to clusters of commodity hardware, is gaining increasing popularity in geoscience applications. Optimizations from different levels are often required for different spatial data analysis characteristics [4]. The rapidly increasing volume of 3D raster data needs many cluster resources, which makes the optimization very important. Related works for geospatial big data on Hadoop have mainly focused on parallel analysis and storage based on the original Hadoop; the storage mechanisms of Hadoop Distributed File System (HDFS) have not been modified; and the influence of data storage for spatial analysis is

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Dec 5, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Kluster: Application of k-means clustering to multidimensional GEO-spatial data
Mazin Alkathiri ... Jhummarwala Abdul
-
Mazin Alkathiri, et. al.Mazin Alkathiri ... Jhummarwala Abdul
01 Aug 2017
01 Aug 2017

THE DESIGN OF A HIGH PERFORMANCE EARTH IMAGERY AND RASTER DATA MANAGEMENT AND PROCESSING PLATFORM

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B4

14 Jun 2016
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B4

RASTER DATA PARTITIONING FOR SUPPORTING DISTRIBUTED GIS PROCESSING
B Nguyen Thai ... A Olasz
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XL-3/W3
B Nguyen Thai, et. al.B Nguyen Thai ... A Olasz
20 Aug 2015
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XL-3/W3

GDAL and PROJ Libraries Integrated with GRASS GIS for Terrain Modelling of the Georeferenced Raster Image
Polina Lemenkova ... Olivier Debeir
Technologies | VOL. 11
Polina Lemenkova, et. al.Polina Lemenkova ... Olivier Debeir
22 Mar 2023
Technologies | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors