Abstract

The remote sensing image captioning has attracted wide spread attention in remote sensing field due to its application potentiality. However, most existing approaches model limited interactions between image content and sentence and fail to exploit special characteristics of the remote sensing images. We introduce a novel recurrent attention and semantic gate (RASG) framework to facilitate the remote sensing image captioning in this article, which integrates competitive visual features and a recurrent attention mechanism to generate a better context vector for the images every time as well as enhances the representations of the current word state. Specifically, we first project each image into competitive visual features by taking the advantage of both static visual features and multiscale features. Then, a novel recurrent attention mechanism is developed to extract the high-level attentive maps from encoded features and nonvisual features, which can help the decoder recognize and focus on the effective information for understanding the complex content of the remote sensing images. Finally, the hidden states from the long short-term memory (LSTM) and other semantic references are incorporated into a semantic gate, which contributes to more comprehensive and precise semantic understanding. Comprehensive experiments on three widely used datasets, Sydney-Captions, UCM-Captions, and Remote Sensing Image Captioning Dataset, have demonstrated the superiority of the proposed RASG over a series of attentive models based on image captioning methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call