Visual Relationship Embedding Network for Image Paragraph Generation

Wenbin Che,Ruiqin Xiong,Debin Zhao,Xiaopeng Fan

doi:10.1109/tmm.2019.2954750

Abstract

Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visual Relationship Embedding Network for Image Paragraph Generation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Dec 5, 2019
Citations: 56

Similar Papers

Chinese Image Caption Generation via Visual Attention and Topic Modeling.
Maofu Liu ... Lingjun Li
IEEE Transactions on Cybernetics | VOL. 52
Maofu Liu, et. al.Maofu Liu ... Lingjun Li
22 Jun 2020
IEEE Transactions on Cybernetics | VOL. 52

Adaptive Syncretic Attention for Constrained Image Captioning
Liang Yang ... Haifeng Hu
Neural Processing Letters | VOL. 50
Liang Yang, et. al.Liang Yang ... Haifeng Hu
26 Apr 2019
Neural Processing Letters | VOL. 50

SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network.
Yueming Jin ... Pheng-Ann Heng
IEEE Transactions on Medical Imaging | VOL. 37
Yueming Jin, et. al.Yueming Jin ... Pheng-Ann Heng
01 May 2018
IEEE Transactions on Medical Imaging | VOL. 37

Generation of Image Caption Using CNN-LSTM Based Approach
S Aravindkumar ... M Hemalatha
-
S Aravindkumar, et. al.S Aravindkumar ... M Hemalatha
12 Apr 2019
12 Apr 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visual Relationship Embedding Network for Image Paragraph Generation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia