Dual Position Relationship Transformer for Image Captioning.

Yaohan Wang,Dan Xu,Jinde Cao,Wenhua Qian,Pyoungwon Kim,Rencan Nie

doi:10.1089/big.2021.0262

Abstract

Employing feature vectors extracted from the target detector has been shown to be effective in improving the performance of image captioning. However, it is considered that existing framework suffers from the deficiency of insufficient information extraction, such as positional relationships; it is very important to judge the relationship between objects. To fill this gap, we present a dual position relationship transformer (DPR) for image captioning; the architecture improves the image information extraction and description coding steps: it first calculates the relative position (RP) and absolute position (AP) between objects, and integrates the dual position relationship information into self-attention. Specifically, convolutional neural network (CNN) and faster R-CNN are applied to extract image features and target detection, then to calculate the RP and AP of the generated object boxes. The former is expressed in coordinate form, and the latter is calculated by sinusoidal encoding. In addition, to better model the sequence and time relationship in the description, DPR adopts long short-term memory to encode text vector. We conduct extensive experiments on the Microsoft COCO: Common Objects in Context (MSCOCO) image captioning data set that shows that our method achieves superior performance that Consensus-based Image Description Evaluation (CIDEr) increased to 114.6 after training 30 epochs and runs 2 times faster, compared with other competitive methods. The ablation study verifies the effectiveness of our proposed module.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dual Position Relationship Transformer for Image Captioning.

Abstract

Talk to us

Similar Papers

More From: Big data

Lead the way for us

Similar Papers

Transforming Healthcare: Leveraging Vision-Based Neural Networks for Smart Home Patient Monitoring
Hicham Gibet Tani ... Fatiha Elouaai
International Journal of Online and Biomedical Engineering (iJOE) | VOL. 19
Hicham Gibet Tani, et. al.Hicham Gibet Tani ... Fatiha Elouaai
01 Aug 2023
International Journal of Online and Biomedical Engineering (iJOE) | VOL. 19

Guiding Visually Impaired People to Find an Object by Using Image to Speech over the Smart Phone Cameras
Tayyip Mert Denizgez ... Ahmet Sayar
-
Tayyip Mert Denizgez, et. al.Tayyip Mert Denizgez ... Ahmet Sayar
25 Aug 2021
25 Aug 2021

Target Detection of Hyperspectral Image Based on Faster R-CNN with Data Set Adjustment and Parameter Turning
Xuefeng Liu ... Salah Bourennane
-
Xuefeng Liu, et. al.Xuefeng Liu ... Salah Bourennane
01 Jun 2019
01 Jun 2019

Image Caption Generator with Novel Object Injection
Mirza Muhammad Ali Baig ... Nauman Zafar
-
Mirza Muhammad Ali Baig, et. al.Mirza Muhammad Ali Baig ... Nauman Zafar
01 Dec 2018
01 Dec 2018

Journal: Big data	Publication Date: Jan 4, 2022
Citations: 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dual Position Relationship Transformer for Image Captioning.

Abstract

Talk to us

Similar Papers

More From: Big data