A Data Allocation Strategy for Geocomputation Based on Shape Complexity in A Cloud Environment Using Parallel Overlay Analysis of Polygons as an Example

Kang Zhao,Mei Yang,Hong Fan,Baoxuan Jin

doi:10.1109/access.2020.3030700

Kang Zhao, Mei Yang + Show 2 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.3030700

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Given the explosive growth of geospatial data, parallel computing technologies have become widely used in the spatial analysis of these massive types of data. The data used in geographic computing often exhibit a complex graphic structure, which is an important cause of data skew in parallel computing. The shape complexity is crucial to the task allocation strategy of parallel computing. The effect of polygon shape features on the performance of spatial analysis was investigated in this study. A quantitative polygon-shaped complexity evaluation model was established through regression analysis. The Hilbert data partition strategy weighted by shape complexity was used as a spatial data allocation method for parallel spatial analysis. This study established a shape complexity evaluation model for overlay analysis and used the Spark parallel computing paradigm to carry out a comparative experiment of a massive, complex polygon. Experimental results showed that the spatial data allocation strategy based on the complexity of polygon shape computing effectively solved the problem of data skew in the parallel spatial analysis of massive complex polygons.

Highlights

In recent years, geospatial science has faced challenges in data-intensive computing with the explosive growth of geospatial data [1]
In the parallel overlay analysis, we found out the shape feature factors that affect the time cost of the algorithm, and created the polygon shape complexity evaluation model by using the stepwise regression method, so as to quantified the computational strength in the overlay analysis of each polygon
The experimental results showed that the proposed method had an optimal rationality of the spatial data allocation and short CT, thereby showing that the proposed shape computational complexity model reflected the effect of the polygon shape complexity on the overlay analysis efficiency

Summary

INTRODUCTION

Geospatial science has faced challenges in data-intensive computing with the explosive growth of geospatial data [1]. In the parallel overlay analysis, we found out the shape feature factors that affect the time cost of the algorithm, and created the polygon shape complexity evaluation model by using the stepwise regression method, so as to quantified the computational strength in the overlay analysis of each polygon. We selected the potential factors that affect the calculation of shape complexity from the aspects of local features and spatial distribution to fully describe the effect of the feature differences of complex polygons on the performance of overlay analysis. This study used seven factors, namely, the number of polygon vertices (vertices), number of ring structures (parts), concavity (concavity), density of polygon vertices (DP), coverage ratio of polygon area (RC), edge vibration frequency (freq), and spatial aggregation of polygon vertices (ANN), as potential factors to create a model assessing the computational complexity on the basis of the polygon shape. The number of polygon vertices had a great effect on the efficiency of overlay analysis

DATA DISTRIBUTION STRATEGY BASED ON THE SHAPE COMPLEXITY

EXPERIMENTAL STUDY

CONCLUSION AND FUTURE RESEARCH