Image Captioning with Internal and External Knowledge

Feicheng Huang,Zhixin Li,Canlong Zhang,Shengjia Chen,Huifang Ma

doi:10.1145/3340531.3411948

Abstract

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping relationships between words in sentence and regions in image, such unpredictable matching manner sometimes causes inharmonious alignments that may reduce the quality of generated captions. In this paper, we make our efforts to reason about more accurate and meaningful captions. We first propose word attention to improve the correctness of visual attention when generating sequential descriptions word-by-word. The special word attention emphasizes on word importance when focusing on different regions of the input image, and makes full use of the internal annotation knowledge to assist the calculation of visual attention. Then, in order to reveal those incomprehensible intentions that cannot be expressed straightforwardly by machines, we inject external knowledge extracted from knowledge graph into the encoder-decoder framework to facilitate meaningful captioning. We validate our model on two freely available captioning benchmarks: Microsoft COCO dataset and Flickr30k dataset. The results demonstrate that our approach achieves state-of-the-art performance and outperforms many of the existing approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Image Captioning with Internal and External Knowledge

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Boost image captioning with knowledge reasoning
Feicheng Huang ... Canlong Zhang
Machine Learning | VOL. 109
Feicheng Huang, et. al.Feicheng Huang ... Canlong Zhang
27 Oct 2020
Machine Learning | VOL. 109

Hazardous waste capacity assurance planning: Volumes for one‐time remediation
Jean H Peretz ... Ho‐Ling Hwang
Remediation Journal | VOL. 4
Jean H Peretz, et. al.Jean H Peretz ... Ho‐Ling Hwang
01 Jun 1994
Remediation Journal | VOL. 4

Cross-Modal Attention With Semantic Consistence for Image-Text Matching.
Xing Xu ... Fumin Shen
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31
Xing Xu, et. al.Xing Xu ... Fumin Shen
30 Nov 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31

Artificial Intelligence Educational & Research Initiatives and Leadership Positions in Academic Radiology Departments
David Li ... Paul H Yi
Current Problems in Diagnostic Radiology | VOL. 51
David Li, et. al.David Li ... Paul H Yi
11 Jan 2022
Current Problems in Diagnostic Radiology | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image Captioning with Internal and External Knowledge

Abstract

Talk to us

Similar Papers