Abstract

With advances in remote sensing, massive amounts of remotely sensed data can be harnessed to support land use/land cover (LULC) change studies over larger scales and longer terms. However, a big challenge is missing data as a result of poor weather conditions and possible sensor malfunctions during image data collection. In this study, cloud-based and open source distributed frameworks that used Apache Spark and Apache Giraph were used to build an integrated infrastructure to fill data gaps within a large-area LULC dataset. Data mining techniques (k-medoids clustering and quadratic discriminant analysis) were applied to facilitate sub-space analyses. Ancillary environmental and socioeconomic conditions were integrated to support localized model training. Multi-temporal transition probability matrices were deployed in a graph-based Markov–cellular automata simulator to fill in missing data. A comprehensive dataset for Inner Mongolia, China, from 2000 to 2016 was used to assess the feasibility, accuracy, and performance of this gap-filling approach. The result is a cloud-based distributed Markov–cellular automata framework that exploits the scalability and high performance of cloud computing while also achieving high accuracy when filling data gaps common in longer-term LULC studies.

Highlights

  • This paper presented a one-stop solution for LULC gap filling for a very large study region by integrating an Apache Spark-based in-memory distributed Markov chain and suitability processing algorithm, and a self-designed Giraph-based distributed cellular automata on the cloud

  • The advantages of this framework were as follows: (1) easy integration with existing cloud-based large-scale remotely sensed (RS) imagery processing tools, where the output can return a LULC dataset directly to a distributed file system on the cloud as input for the LULC gap filling task without additional data transfer; (2) improving traditional Markov–cellular automata (CA) simulation performance by restructuring a CA-based on graph theory; (3) straightforward access to existing LULC and auxiliary data in a cloud data warehouse to construct CA simulation spaces and transition rule grids; (4) the capability to exploit large-scale cloud-based computing frameworks to accelerate processing speed; and

  • The paper’s most innovative technological contribution is developing a data gap filling method for a big data set over a large area by integrating cloud and open-source computing techniques

Read more

Summary

Introduction

In 1991, Townshend et al claimed that only remotely sensed (RS) data could potentially provide accurate and repeatable global LULC for monitoring change over time [1]. Many researchers [2,3] employed remotely sensed data to generate LULC maps. This research either applied relatively low temporal frequency of acquired images or employed mixed data sources from multiple satellites to build LULC time series. This task is recognized to be computationally challenging. Missing data has often been a stumbling block to building a high-resolution LULC dataset for large-scale and long-term time series using the same optical RS data source for a continuous period and for the same time intervals [4,5]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.