Abstract

Due to technical and budget limitations, there are inevitably some trade-offs in the design of remote sensing instruments, making it difficult to acquire high spatiotemporal resolution remote sensing images simultaneously. To address this problem, this paper proposes a new data fusion model named the deep convolutional spatiotemporal fusion network (DCSTFN), which makes full use of a convolutional neural network (CNN) to derive high spatiotemporal resolution images from remotely sensed images with high temporal but low spatial resolution (HTLS) and low temporal but high spatial resolution (LTHS). The DCSTFN model is composed of three major parts: the expansion of the HTLS images, the extraction of high frequency components from LTHS images, and the fusion of extracted features. The inputs of the proposed network include a pair of HTLS and LTHS reference images from a single day and another HTLS image on the prediction date. Convolution is used to extract key features from inputs, and deconvolution is employed to expand the size of HTLS images. The features extracted from HTLS and LTHS images are then fused with the aid of an equation that accounts for temporal ground coverage changes. The output image on the prediction day has the spatial resolution of LTHS and temporal resolution of HTLS. Overall, the DCSTFN model establishes a complex but direct non-linear mapping between the inputs and the output. Experiments with MODerate Resolution Imaging Spectroradiometer (MODIS) and Landsat Operational Land Imager (OLI) images show that the proposed CNN-based approach not only achieves state-of-the-art accuracy, but is also more robust than conventional spatiotemporal fusion algorithms. In addition, DCSTFN is a faster and less time-consuming method to perform the data fusion with the trained network, and can potentially be applied to the bulk processing of archived data.

Highlights

  • The advances in modern sensor technology have greatly expanded the use of remote sensing images in scientific research and in many other life activities of humankind [1,2,3]

  • A number of remote sensing data fusion algorithms have been put forward to address this problem, and research has shown that generating high spatiotemporal data by fusing high spatial resolution images and high temporal resolution images from multiple data sources is a practical approach [12,13]

  • Three images including Landsat for reference and MODerate Resolution Imaging Spectroradiometer (MODIS) for reference and prediction are entered into the model, and the observed Landsat image for prediction is the expected output that guides the direction of the training process

Read more

Summary

Introduction

The advances in modern sensor technology have greatly expanded the use of remote sensing images in scientific research and in many other life activities of humankind [1,2,3]. A number of remote sensing data fusion algorithms have been put forward to address this problem, and research has shown that generating high spatiotemporal data by fusing high spatial resolution images and high temporal resolution images from multiple data sources is a practical approach [12,13]. The reconstruction-based methods generate composite images from weighted sums of spectrally similar neighboring pixels in the HTLS and LTHS image pairs [16]. A typical example is the spatial and temporal adaptive reflectance fusion model (STARFM) [12] It uses the differences between HTLS and LTHS to establish a relation, and searches similar neighboring pixels by spectral difference, temporal difference, and location distance.

CNN Model
DCSTFN Architecture
Data Preparation
Experiment
Comparison
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call