A Scalable Computing Resources System for Remote Sensing Big Data Processing Using GeoPySpark Based on Spark on K8s

Jifu Guo,Jinliang Hou,Chunlin Huang

doi:10.3390/rs14030521

Abstract

As a result of Earth observation (EO) entering the era of big data, a significant challenge relating to by the storage, analysis, and visualization of a massive amount of remote sensing (RS) data must be addressed. In this paper, we proposed a novel scalable computing resources system to achieve high-speed processing of RS big data in a parallel distributed architecture. To reduce data movement among computing nodes, the Hadoop Distributed File System (HDFS) is established on nodes of K8s, which are also used for computing. In the process of RS data analysis, we innovatively use the tile-oriented programming model instead of the traditional strip-oriented or pixel-oriented approach to better implement parallel computing in a Spark on Kubernetes (K8s) cluster. A large RS raster layer can be abstracted as a user-defined tile format of any size, so that a whole computing task can be divided into multiple distributed parallel tasks. The computing resources applied by users would be immediately assigned in the Spark on K8s cluster by simply configuring and initializing SparkContext through a web-based Jupyter notebook console. Users can easily query, write, or visualize data in any box size from the catalog module in GeoPySpark. In summary, the system proposed in this study can provide a distributed scalable resources system for assembling big data storage, parallel computing, and real-time visualization.

Highlights

As a result of the development of Earth observation (EO) and sensor technologies, humans’ ability to undertake comprehensive observation of the Earth has entered an unprecedented period, and Earth system sciences have entered the era of big data [1,2].The increasing availability of sensor technology has drastically promoted our ability to collect time-varying geospatial and climate data
We evaluated the performance of the system through a case study in which the daily distribution of 250 m NDVI within the mainland area of China was estimated using
We evaluate the efficiency of submitting computing jobs with different computing resources, and explore how to mine time-series information more effectively from massive amounts of remote sensing (RS)

Summary

Introduction

As a result of the development of Earth observation (EO) and sensor technologies, humans’ ability to undertake comprehensive observation of the Earth has entered an unprecedented period, and Earth system sciences have entered the era of big data [1,2]. The increasing availability of sensor technology has drastically promoted our ability to collect time-varying geospatial and climate data. The data collection volumes and rates overwhelm those of the past. The observation data streaming rate of NASA’s current missions is approximately 1.73 GB gigabytes per seconds, and the scale of NASA’s climate change data repository is expected to increase to 230 petabytes by the end of 2030 [3]. It is difficult to deal with the huge volume of RS data in a traditional computing paradigm

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Remote Sensing	Publication Date: Jan 22, 2022
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Scalable Computing Resources System for Remote Sensing Big Data Processing Using GeoPySpark Based on Spark on K8s

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Remote Sensing

Lead the way for us

Similar Papers

Remote Sensing Big Data: Theory, Methods and Applications
Peng Liu ... Liping Di
Remote Sensing | VOL. 10
Peng Liu, et. al.Peng Liu ... Liping Di
04 May 2018
Remote Sensing | VOL. 10

Major Challenges with Hadoop Distributed Framework: An Overview
Neelam Sobha Rani ... Neelam Venugopal Muthu Lakshmi
SSRN Electronic Journal | VOL. -
Neelam Sobha Rani, et. al.Neelam Sobha Rani ... Neelam Venugopal Muthu Lakshmi
07 Feb 2018
SSRN Electronic Journal | VOL. -

Optimize Parallel Data Access in Big Data Processing
Jiangling Yin ... Jun Wang
-
Jiangling Yin, et. al.Jiangling Yin ... Jun Wang
01 May 2015
01 May 2015

Pre-processing, classification and semantic querying of large-scale Earth observation spaceborne/airborne/terrestrial image databases: Process and product innovations.

-

10 Apr 2017
10 Apr 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Scalable Computing Resources System for Remote Sensing Big Data Processing Using GeoPySpark Based on Spark on K8s

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Remote Sensing