Abstract

Remotely-sensed satellite image fusion is indispensable for the generation of long-term gap-free Earth observation data. While cloud computing (CC) provides the big picture for RS big data (RSBD), the fundamental question of the efficient fusion of RSBD on CC platforms has not yet been settled. To this end, we propose a lightweight cloud-native framework for the elastic processing of RSBD in this study. With the scaling mechanisms provided by both the Infrastructure as a Service (IaaS) and Platform as a Services (PaaS) of CC, the Spark-on-Kubernetes operator model running in the framework can enhance the efficiency of Spark-based algorithms without considering bottlenecks such as task latency caused by an unbalanced workload, and can ease the burden to tune the performance parameters for their parallel algorithms. Internally, we propose a task scheduling mechanism (TSM) to dynamically change the Spark executor pods’ affinities to the computing hosts. The TSM learns the workload of a computing host. Learning from the ratio between the number of completed and failed tasks on a computing host, the TSM dispatches Spark executor pods to newer and less-overwhelmed computing hosts. In order to illustrate the advantage, we implement a parallel enhanced spatial and temporal adaptive reflectance fusion model (PESTARFM) to enable the efficient fusion of big RS images with a Spark aggregation function. We construct an OpenStack cloud computing environment to test the usability of the framework. According to the experiments, TSM can improve the performance of the PESTARFM using only PaaS scaling to about 11.7%. When using both the IaaS and PaaS scaling, the maximum performance gain with the TSM can be even greater than 13.6%. The fusion of such big Sentinel and PlanetScope images requires less than 4 min in the experimental environment.

Highlights

  • Cloud computing (CC) has shown its strength for bootstrapping genomic data analysis in life science, and has unquestionably influenced earth science [1,2]

  • While RS fusion algorithms (RSFAs) play a vital role in data mining in CCRSBDPS, few works direct an efficient fusion of heterogenous RS big data (RSBD) on CC platforms

  • Note that the parameter is the internal parameter of the Spark-based fusion algorithm that determines the number of resilient distributed datasets (RDDs) blocks

Read more

Summary

Introduction

Cloud computing (CC) has shown its strength for bootstrapping genomic data analysis in life science, and has unquestionably influenced earth science [1,2]. RS big data (RSBD) production systems (CCRSBDPS) have become prevalent in recent years [3]. CC has shown significant potential in massive RS data storage and processing, on-demand services, and information services in domains such as drought monitoring, ecology assessment, and crop yield prediction. The ability to generate time-series RS data archives with an on-demand parallel has created vast opportunities for advanced natural resource monitoring and process understanding. While RSFAs play a vital role in data mining in CCRSBDPS, few works direct an efficient fusion of heterogenous RSBD on CC platforms

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.