An Image Captioning Model Based on Bidirectional Depth Residuals and its Application

Ziwei Zhou,Chaoyang Wang,Shuo Wang,Wei Xie,Shaoqiang Ge,Ye Zhang,Liang Xu

doi:10.1109/access.2021.3057091

Abstract

A novel network model “bidirectional depth residuals gated recurrent unit network (BDR-GRU) ” is designed and implemented for improving the effectiveness of Image Captioning. BDR-GRU is designed based on encoder and decoder architecture. Moreover, the network can run on an NVIDIA JETSON TX2 processor, which makes the algorithm applied to mobile robots. In the encoding stage, the convolution neural network is used to obtain the multi-dimensional vector information of the image, and the BDR-GRU network is used to complete the sentence generation in the decoding stage. The BDR-GRU network model is a new recurrent neural network model, which is improved on the basics of the GRU network. Firstly, the layer of the GRU network is increased from a single layer to multiple layers. Secondly, the bidirectional derivation structure is redesigned to enhance the ability of derivation. Finally, the residual mechanism between levels is designed to prevent the disappearance of gradient and over-fitting caused by the increase of the layers. Experiments are carried out on TX2 processor and have been done to verify the effectiveness of our design, and the results are compared with img-gLSTM network model, neural talk model, attention model, and unidirectional GRU model, then the results are analyzed. The experimental results show that the CIDEr evaluation value of our network model is 12.7% higher than that of the img-gLSTM network and 14.6% higher than that of the Neural Talk network, other evaluation indicators also improve significantly. The experimental results prove the significance of our BDR-GRU model.

Highlights

People use language to express their thoughts and describe what they see in life, but generating computer vision information into human descriptive languages is an extremely challenging subject, which combines image processing and language processing and other research directions
We choose the JETSON TX2 embedded processor developed by NVIDIA, which has powerful performance but small size, as the core processor of the experiments, which are shown in Fig. 7 and Fig. 8
We take the processor as the core of a mobile robot and the results of image captioning bring a better effect to the human-computer interaction of the mobile robot

Summary

Introduction

People use language to express their thoughts and describe what they see in life, but generating computer vision information into human descriptive languages is an extremely challenging subject, which combines image processing and language processing and other research directions. INDEX TERMS Computer vision, image captioning, deep neural network, BDR-GRU.

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Image Captioning Model Based on Bidirectional Depth Residuals and its Application

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Research Trends on Deep Transformation Neural Models for Text Analysis in NLP Applications
T Chellatamilan ... K Santhi
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 9
T Chellatamilan, et. al.T Chellatamilan ... K Santhi
30 Jul 2020
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 9

Author response: Neural learning rules for generating flexible predictions and computing the successor representation
Ching Fang ... LF Abbott
-
Ching Fang, et. al.Ching Fang ... LF Abbott
12 Oct 2022
12 Oct 2022

Editor's evaluation: Neural learning rules for generating flexible predictions and computing the successor representation
Srdjan Ostojic
-
Srdjan OstojicSrdjan Ostojic
29 Aug 2022
29 Aug 2022

Decision letter: Neural learning rules for generating flexible predictions and computing the successor representation
Arthur Juliani ... Timothy E Behrens
-
Arthur Juliani, et. al.Arthur Juliani ... Timothy E Behrens
29 Aug 2022
29 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Image Captioning Model Based on Bidirectional Depth Residuals and its Application

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access