Improved Image Captioning via Semantic Feature Update

Peng Tian,Hongwei Mo,Laihao Jiang

doi:10.23919/ccc52363.2021.9549991

Abstract

Image captioning is one of the main visual tasks to achieve scene understanding, involving detecting and recognizing objects and their relationships, and describing the image content with natural language. In order to improve the accuracy of the spatial attention mechanism to obtain image features to improve the performance of image captioning, we propose an image captioning model based on semantic feature update, to extract the features of different semantic layers in the scene image and then iteratively update these features to achieve accurate description of the scene image. Firstly, we use Faster R-CNN to extract the object features, visual relationship features between the objects in the image, and the global features of the image. Secondly, we use a feature refining network that takes the object features and relationship features as input to update the object features and relationship features. Finally, the updated features as the input of the spatial attention mechanism are integrated into the description framework to improve the accuracy of image captioning. Experiments on the COCO dataset demonstrate the superiority of our proposed captioning model over other captioning models.

Full Text