Image Caption Method Based on Graph Attention Network with Global Context

Jiahong Sui,Ping Ping,Huimin Yu,Xinyue Liang

doi:10.1109/icivc55077.2022.9886239

Jiahong Sui, Ping Ping + Show 2 more

https://doi.org/10.1109/icivc55077.2022.9886239

Copy DOI

Export

Save

Cite

Publication Date: Jul 26, 2022

Citations: 1

Affiliation: Hohai University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Faster-RCNN is commonly used to extract image features in image-to-text generative models since the development of deep learning, although the extraction procedure is time-consuming. Existing approaches extract fixed-size grid features and then use language models to generate image captions, but they only focus on the grid spatial location features without considering grid feature interaction and image global features. An image caption method based on graph attention network with global context is proposed to generate higher-quality image captions. By building a grid feature interaction graph, a multi-layer convolutional neural network is utilized for visual encoding, and the grid features and entire image features of a given image are retrieved. Then, using the graph attention network, which includes a global node and many local nodes, the feature extraction problem is changed into a node classification problem, and the global and local features can be completely utilized after updating and optimization. Finally, the Transformer-based decoding module makes use of the enhanced visual features to provide image captions. The Microsoft COCO dataset is used for experiments evaluation. The experimental results demonstrate that the image caption method based on graph attention network with global context successfully captures the global and local features of the image and achieves 133.1% CIDEr, significantly improving quality of image caption.

Full Text