Abstract

Remote sensing image captioning models require large amounts of caption-labeled training data. Though image classification models are normally trained with sufficient training data, they cannot be straightforwardly applied to remote sensing image captioning, because the labels for classification and captioning arise from different task domains. Additionally, remote sensing images with caption labels are not as sufficient as images with class labels. Such limitations render difficulty to effective remote sensing image captioning. To address these limitations, we develop a meta captioning framework that conducts remote sensing image captioning with meta learning. The meta captioning framework extracts meta features from two support tasks, i.e., natural image classification and remote sensing image classification, and transfers the meta features to one target task, i.e., remote sensing image captioning. The two support tasks train classification models with big amounts of class-labeled data such that they extract meta features that comprehensively represent image visual features from the perspective of classification. The target task, equipped by the meta features, just requires a relatively small amount of caption-labeled training data for achieving effective remote sensing image captioning. Experimental evaluations on three public datasets validate that the meta captioning framework achieves state-of-the-art performance on remote sensing image captioning. We release the code for our work at: https://github.com/QiaoqiaoYang/MetaCaptioning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call