Repeated review based image captioning for image evidence review

Jinning Guan,Eric Wang

doi:10.1016/j.image.2018.02.005

Abstract

We propose a repeated review deep learning model for image captioning in image evidence review process. It consists of two subnetworks. One is the convolutional neural network which is employed to extract the image features and the other is the recurrent neural network which is used to decode the image features into captions. Our model combines the advantages of the two subnetworks by recalling visual information different from the traditional model of encoder–decoder, and then introduces multimodal layer to fuse the image and caption effectively. The proposed model has been validated on benchmark datasets (MSCOCO, Flickr). It shows that the proposed model performs well on bleu-3 and bleu-4, even to some extent, beyond the best models available today (such as NIC, m-RNN, etc.).

Full Text