Abstract

As a crossing domain of computer vision and natural language processing, the image caption generation has been an active research topic in recent years, which contributes to the multimodal social media translation from unstructured image data to structured text data. The conventional research works have proposed a series of image captioning methods, such as template-based, retrieval-based, encode-decode. Among these methods, the one with encode-decode framework is widely used in the image caption generation, in which the encoder extracts the image features by Convolutional Neural Network (CNN), and the decoder adopts Recurrent Neural Network (RNN) to generate the image description. The Neural Image Caption (NIC) model has achieved good performance in image captioning, and however, there still remains some challenges to be addressed. To tackle the challenges of the lack of image information and the deviation from the core content of the image, our proposed model explores visual attention to deepen the understanding of the image, incorporating the image labels generated by Fully Convolutional Network (FCN) into the generation of image caption. Furthermore, our proposed model exploits textual attention to increase the integrity of the information. Finally, the label generation, attached to the textual attention mechanism, and the image caption generation, have been merged to form an end-to-end trainable framework. In this paper, extensive experiments have been carried out on the AIC-ICC image caption benchmark dataset, and the experimental results show that our proposed model is effective and feasible in the image caption generation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.