Abstract

Remote sensing image captioning, which aims to understand high-level semantic information and interactions of different ground objects, is a new emerging research topic in recent years. Though image captioning has developed rapidly with convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the image captioning task for remote sensing images still suffers from two main limitations. One limitation is that the scales of objects in remote sensing images vary dramatically, which makes it difficult to obtain an effective image representation. Another limitation is that the visual relationship in remote sensing images is still underused, which should have great potential to improve the final performance. In order to deal with these two limitations, an effective framework for captioning the remote sensing image is proposed in this paper. The framework is based on multi-level attention and multi-label attribute graph convolution. Specifically, the proposed multi-level attention module can adaptively focus not only on specific spatial features, but also on features of specific scales. Moreover, the designed attribute graph convolution module can employ the attribute-graph to learn more effective attribute features for image captioning. Extensive experiments are conducted and the proposed method achieves superior performance on UCM-captions, Sydney-captions and RSICD dataset.

Highlights

  • With the great progress of remote sensing technology, high-quality remote sensing images are captured more which provides a large number of available data for researches [1], [2]

  • Remote sensing image captioning, which aims to understand the high-level semantic information and the interactions of different ground objects, The associate editor coordinating the review of this manuscript and approving it for publication was Po Yang

  • In this work, a remote sensing image captioning framework based on multi-level attention and multi-label attribute graph convolution is proposed to improve the performance from two aspects

Read more

Summary

Introduction

With the great progress of remote sensing technology, high-quality remote sensing images are captured more which provides a large number of available data for researches [1], [2]. Generating natural-language descriptions for remote sensing image can provide richer high-level semantic information, such as scene structures or object relationships. Remote sensing image captioning, which aims to understand the high-level semantic information and the interactions of different ground objects, The associate editor coordinating the review of this manuscript and approving it for publication was Po Yang. It provides far richer descriptions of remote sensing scene in a highersemantic level by generating a corresponding sentence to abstract the content. Accurate and flexible sentences are generated automatically to describe the content of remote sensing images. Remote sensing image captioning identifies the ground objects under different levels and analyzes their attributes and spatial relationships in the aerial view [7]. The interactions between objects are visual relationships which are embedded in image captions. The caption ‘‘Some white planes are in an airport’’ describes the visual relationship between planes and airport

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.