BiGRU-RA Model for Image Chinese Captioning via Global and Local Features

Zhenrong Deng,Rushi Lan,Rui Yang,Wenming Huang,Xiaonan Luo,Yonglin Zhang

doi:10.3724/sp.j.1089.2021.18262

Abstract

<p indent=0mm>To address the problem of insufficient detailed semantic information in current global features-based image captioning models, an image Chinese captioning model combining global and local features is proposed. The proposed model adopts the encoder-decoder framework. In the coding stage, the residual networks (ResNet) and Faster R-CNN are used to extract the global and local features of images respectively, improving the model ҆ s utilization of image features at different scales. A bi-directional gated recurrent unit (BiGRU) with embedded visual attention structure and residual connection structure is applied as the decoder (BiGRU with residual connection and attention, BiGRU-RA). The model can adaptively allocate image features and text weights, and improve the mapping relationship between image feature regions and context information. Additionally, the reinforcement learning-based policy gradient is added to improve the loss function of the model and optimize the evaluation criteria CIDEr directly. The training and experiments are conducted on the Chinese captioning dataset of AI challenger. The comparative results show that the proposed model obtained better scores and the generated caption are more accurate and detailed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BiGRU-RA Model for Image Chinese Captioning via Global and Local Features

Abstract

Talk to us

Similar Papers

More From: Journal of Computer-Aided Design & Computer Graphics

Lead the way for us

Journal: Journal of Computer-Aided Design & Computer Graphics	Publication Date: Jan 1, 2021
Citations: 1

Similar Papers

No-Reference Video Quality Assessment Using the Temporal Statistics of Global and Local Image Features.
Domonkos Varga
Sensors (Basel, Switzerland) | VOL. 22
Domonkos VargaDomonkos Varga
10 Dec 2022
Sensors (Basel, Switzerland) | VOL. 22

3G structure for image caption generation
Aihong Yuan ... Xiaoqiang Lu
Neurocomputing | VOL. 330
Aihong Yuan, et. al.Aihong Yuan ... Xiaoqiang Lu
01 Nov 2018
Neurocomputing | VOL. 330

A hybrid approach for vision-based outdoor robot localization using global and local image features
Christian Weiss ... Hashem Tamimi
-
Christian Weiss, et. al.Christian Weiss ... Hashem Tamimi
01 Oct 2007
01 Oct 2007

Recognition of Face Biometrics
Pooja Sharma
-
Pooja SharmaPooja Sharma
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BiGRU-RA Model for Image Chinese Captioning via Global and Local Features

Abstract

Talk to us

Similar Papers

More From: Journal of Computer-Aided Design &amp; Computer Graphics

More From: Journal of Computer-Aided Design & Computer Graphics