Remote Sensing Image Captioning Research Articles

Image captioning has gradually gained attention in the field of artificial intelligence and become an interesting and challenging task for image understanding. It needs to identify important objects in images, extract attributes, tell relationships, and help the machine generate human-like descriptions. Recent works in deep neural networks have greatly improved the performance of image caption models. However, machines are still unable to imitate the way humans think, talk and communicate, so image captioning remains an ongoing task. It is thus very important to keep up with the latest research and results in the field of image captioning whereas publications on this topic are numerous. Our work aims to help researchers to have a macro-level understanding of image captioning from four aspects: spatial-temporal distribution characteristics, collaborative networks, trends in subject research, and historical evolutionary path. We employ scientometric visualization methods to achieve this goal. The results show that China has published the largest amount of publications in image captioning, but the United States has the greatest impact on research in this area. Besides, thirteen academic groups are identified in the field of image description, with institutions such as Microsoft, Google, Australian National University, and Georgia Institute of Technology being the most prominent research institutions. Meanwhile, we find that evaluation methods, datasets, novel image captioning models based on generative adversarial networks, reinforcement learning, and Transformer, as well as remote sensing image captioning, are the new research trends. Lastly, we conclude that image captioning research has gone through three major development stages from 2010 to 2020, and on this basis, we propose a more comprehensive taxonomy of image captioning.

A comprehensive interpretation of remote sensing images involves not only remote sensing object recognition but also the recognition of spatial relations between objects. Especially in the case of different objects with the same spectrum, the spatial relationship can help interpret remote sensing objects more accurately. Compared with traditional remote sensing object recognition methods, deep learning has the advantages of high accuracy and strong generalizability regarding scene classification and semantic segmentation. However, it is difficult to simultaneously recognize remote sensing objects and their spatial relationship from end-to-end only relying on present deep learning networks. To address this problem, we propose a multi-scale remote sensing image interpretation network, called the MSRIN. The architecture of the MSRIN is a parallel deep neural network based on a fully convolutional network (FCN), a U-Net, and a long short-term memory network (LSTM). The MSRIN recognizes remote sensing objects and their spatial relationship through three processes. First, the MSRIN defines a multi-scale remote sensing image caption strategy and simultaneously segments the same image using the FCN and U-Net on different spatial scales so that a two-scale hierarchy is formed. The output of the FCN and U-Net are masked to obtain the location and boundaries of remote sensing objects. Second, using an attention-based LSTM, the remote sensing image captions include the remote sensing objects (nouns) and their spatial relationships described with natural language. Finally, we designed a remote sensing object recognition and correction mechanism to build the relationship between nouns in captions and object mask graphs using an attention weight matrix to transfer the spatial relationship from captions to objects mask graphs. In other words, the MSRIN simultaneously realizes the semantic segmentation of the remote sensing objects and their spatial relationship identification end-to-end. Experimental results demonstrated that the matching rate between samples and the mask graph increased by 67.37 percentage points, and the matching rate between nouns and the mask graph increased by 41.78 percentage points compared to before correction. The proposed MSRIN has achieved remarkable results.

Remote Sensing Image Captioning Research Articles

Articles published on Remote Sensing Image Captioning

Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning

Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer

TypeFormer: Multiscale Transformer With Type Controller for Remote Sensing Image Caption

Global Visual Feature and Linguistic State Guided Attention for Remote Sensing Image Captioning

A Novel SVM-Based Decoder for Remote Sensing Image Captioning

Multi-label semantic feature fusion for remote sensing image captioning

A Scientometric Visualization Analysis of Image Captioning Research From 2010 to 2020

Remote Sensing Image Caption Method Based on Attention and Reinforcement Learning

Word–Sentence Framework for Remote Sensing Image Captioning

SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning

Truncation Cross Entropy Loss for Remote Sensing Image Captioning

Remote sensing image caption generation via transformer and reinforcement learning

Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning

A Multi-Level Attention Model for Remote Sensing Image Captions

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning

LAM: Remote Sensing Image Captioning with Label-Attention Mechanism

Multi-Scale Remote Sensing Semantic Analysis Based on a Global Perspective

Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model

Description Generation for Remote Sensing Images Using Attribute Attention Mechanism

Region Driven Remote Sensing Image Captioning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Remote Sensing Image Captioning Research Articles

Articles published on Remote Sensing Image Captioning

Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning

Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer

TypeFormer: Multiscale Transformer With Type Controller for Remote Sensing Image Caption

Global Visual Feature and Linguistic State Guided Attention for Remote Sensing Image Captioning

A Novel SVM-Based Decoder for Remote Sensing Image Captioning

Multi-label semantic feature fusion for remote sensing image captioning

A Scientometric Visualization Analysis of Image Captioning Research From 2010 to 2020

Remote Sensing Image Caption Method Based on Attention and Reinforcement Learning

Word–Sentence Framework for Remote Sensing Image Captioning

SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning

Truncation Cross Entropy Loss for Remote Sensing Image Captioning

Remote sensing image caption generation via transformer and reinforcement learning

Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning

A Multi-Level Attention Model for Remote Sensing Image Captions

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning

LAM: Remote Sensing Image Captioning with Label-Attention Mechanism

Multi-Scale Remote Sensing Semantic Analysis Based on a Global Perspective

Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model

Description Generation for Remote Sensing Images Using Attribute Attention Mechanism

Region Driven Remote Sensing Image Captioning