Abstract

Image captioning generates a semantic description of an image. It deals with image understanding and text mining, which has made great progress in recent years. However, it is still a great challenge to bridge the “semantic gap” between low-level features and high-level semantics in remote sensing images, in spite of the improvement of image resolutions. In this paper, we present a new model with an attribute attention mechanism for the description generation of remote sensing images. Therefore, we have explored the impact of the attributes extracted from remote sensing images on the attention mechanism. The results of our experiments demonstrate the validity of our proposed model. The proposed method obtains six higher scores and one slightly lower, compared against several state of the art techniques, on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), and receives all seven higher scores on the UCM Dataset for remote sensing image captioning, indicating that the proposed framework achieves robust performance for semantic description in high-resolution remote sensing images.

Highlights

  • With the development of computers and sensors, modern remote sensing technologies have seen rapid successful progress

  • We propose a new model with an attribute attention mechanism for remote sensing image captioning

  • The Sydney Dataset has only 613 images in total, while Remote Sensing Image Caption Dataset (RSICD) has more than 10,000 images and UC Merced (UCM) is with 2,100 images

Read more

Summary

Introduction

With the development of computers and sensors, modern remote sensing technologies have seen rapid successful progress. Remote sensing images are of high resolution. Their interpretation and understanding are still limited to the feature level, such as scene classification [4,5,6] and object detection [7,8] with little reasoning and understanding of the scene. This handling cannot solve the “semantic gap” problem between low-level features and high-level abstract or summarization. Correctly interpreting high-resolution remote sensing images at different levels in a large dataset has become one of the most challenging scientific problems in the field

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.