CaptionNet: A Tailor-made Recurrent Neural Network for Generating Image Descriptions

Longyu Yang,Pengjie Tang,Qinyu Li,Hanli Wang

doi:10.1109/tmm.2020.2990074

Abstract

Image captioning is a challenging task of visual understanding and has drawn more attention of researchers. In general, two inputs are required at each time step by the Long Short-Term Memory (LSTM) network used in popular attention based image captioning frameworks, including image features and previous generated words. However, error will be accumulated if the previous words are not accurate and the related semantic is not efficient enough. Facing these challenges, a novel model named CaptionNet is proposed in this work as an improved LSTM specially designed for image captioning. Concretely, only attended image features are allowed to be fed into the memory of CaptionNet through input gates. In this way, the dependency on the previous predicted words can be reduced, forcing model to focus on more visual clues of images at the current time step. Moreover, a memory initialization method called image feature encoding is designed to capture richer semantics of the target image. The evaluation on the benchmark MSCOCO and Flickr30K datasets demonstrates the effectiveness of the proposed CaptionNet model, and extensive ablation studies are performed to verify each of the proposed methods. The project page can be found in https://mic.tongji.edu.cn/3f/9c/c9778a147356/page.htm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CaptionNet: A Tailor-made Recurrent Neural Network for Generating Image Descriptions

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Apr 28, 2020
Citations: 71

Similar Papers

Boosting Memory with a Persistent Memory Mechanism for Remote Sensing Image Captioning
Kun Fu ... Hongfeng Yu
Remote Sensing | VOL. 12
Kun Fu, et. al.Kun Fu ... Hongfeng Yu
09 Jun 2020
Remote Sensing | VOL. 12

Long short-term memory network with external memories for image caption generation
Teng Jiang ... Chengjun Zhan
Journal of Electronic Imaging | VOL. 28
Teng Jiang, et. al.Teng Jiang ... Chengjun Zhan
28 Mar 2019
Journal of Electronic Imaging | VOL. 28

Image Caption: Explaining Pictures by Text using Deep Learning
Sunil Varma ... Nitika Kapoor
-
Sunil Varma, et. al.Sunil Varma ... Nitika Kapoor
05 Apr 2023
05 Apr 2023

Chinese Image Caption Generation via Visual Attention and Topic Modeling.
Maofu Liu ... Lingjun Li
IEEE Transactions on Cybernetics | VOL. 52
Maofu Liu, et. al.Maofu Liu ... Lingjun Li
22 Jun 2020
IEEE Transactions on Cybernetics | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CaptionNet: A Tailor-made Recurrent Neural Network for Generating Image Descriptions

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia