Abstract

Remote sensing image captioning involves remote sensing objects and their spatial relationships. However, it is still difficult to determine the spatial extent of a remote sensing object and the size of a sample patch. If the patch size is too large, it will include too many remote sensing objects and their complex spatial relationships. This will increase the computational burden of the image captioning network and reduce its precision. If the patch size is too small, it often fails to provide enough environmental and contextual information, which makes the remote sensing object difficult to describe. To address this problem, we propose a multi-scale semantic long short-term memory network (MS-LSTM). The remote sensing images are paired into image patches with different spatial scales. First, the large-scale patches have larger sizes. We use a Visual Geometry Group (VGG) network to extract the features from the large-scale patches and input them into the improved MS-LSTM network as the semantic information, which provides a larger receptive field and more contextual semantic information for small-scale image caption so as to play the role of global perspective, thereby enabling the accurate identification of small-scale samples with the same features. Second, a small-scale patch is used to highlight remote sensing objects and simplify their spatial relations. In addition, the multi-receptive field provides perspectives from local to global. The experimental results demonstrated that compared with the original long short-term memory network (LSTM), the MS-LSTM’s Bilingual Evaluation Understudy (BLEU) has been increased by 5.6% to 0.859, thereby reflecting that the MS-LSTM has a more comprehensive receptive field, which provides more abundant semantic information and enhances the remote sensing image captions.

Highlights

  • According to Tobler’s first law of geography, everything is related to everything else, but near things are more related to each other [1]

  • The multi-scale semantic long short-term memory network (MS-long short-term memory network (LSTM)) network, which is a remote sensing image captioning model based on multi-scale semantics, is proposed: In the MS-LSTM network, the multi-scale concept refers to the relationship between the local and the global, that is a small local area and a large area within a certain neighborhood

  • The captioning of remote sensing images usually uses Recurrent Neural Networks (RNN), especially the Long-Short Term Memory (LSTM) which stores information learnt from experience into memory units and can avoid long-term dependencies through forgetting mechanisms and is suitable for sequence modeling, this section will describe the issue according to multi-scale remote sensing, recurrent neural networks (RNNs) and remote image captioning

Read more

Summary

Introduction

According to Tobler’s first law of geography, everything is related to everything else, but near things are more related to each other [1]. The MS-LSTM network, which is a remote sensing image captioning model based on multi-scale semantics, is proposed: In the MS-LSTM network, the multi-scale concept refers to the relationship between the local and the global, that is a small local area and a large area within a certain neighborhood. Small-scale objects are used for the semantic interpretation of remote sensing images which requires simple and efficient generation of image caption sentence In this way, the samples of two scales are generated in ISPRS Int. J. 2. To solve the “fake conflict” problems and fuse large-scale scene information, we innovatively design a multi-scale LSTM parallel deep neural network.

Related Work
Multi-Scale
RNN Series
Remote Sensing Image Captioning
The Multi-Scale Principle
Multi-scale matching principle
Multi-scale Image Captioning Network Structure
Introduction of Test Areas and Samples
Network Parameters
Findings
Model Comparison and Model Stability Analysis
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call