Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling

Pengcheng Yang,Xu Sun,Lei Li,Xiaodong He,Fuli Luo,Peng Chen,Zhiyi Yin

doi:10.24963/ijcai.2019/744

Abstract

The visual storytelling (VST) task aims at generating a reasonable and coherent paragraph-level story with the image stream as input. Different from caption that is a direct and literal description of image content, the story in the VST task tends to contain plenty of imaginary concepts that do not appear in the image. This requires the AI agent to reason and associate with the imaginary concepts based on implicit commonsense knowledge to generate a reasonable story describing the image stream. Therefore, in this work, we present a commonsense-driven generative model, which aims to introduce crucial commonsense from the external knowledge base for visual storytelling. Our approach first extracts a set of candidate knowledge graphs from the knowledge base. Then, an elaborately designed vision-aware directional encoding schema is adopted to effectively integrate the most informative commonsense. Besides, we strive to maximize the semantic similarity within the output during decoding to enhance the coherence of the generated text. Results show that our approach can outperform the state-of-the-art systems by a large margin, which achieves a 29\% relative improvement of CIDEr score. With additional commonsense and semantic-relevance based objective, the generated stories are more diverse and coherent.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

SentiStory: A Multi-Layered Sentiment-Aware Generative Model for Visual Storytelling
Wei Chen ... Jianwei Niu
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 32
Wei Chen, et. al.Wei Chen ... Jianwei Niu
01 Nov 2022
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 32

<title>Describing and classifying multimedia using the description logic GRAIL</title>
Carole A Goble ... Christian Haul
-
Carole A Goble, et. al.Carole A Goble ... Christian Haul
13 Mar 1996
13 Mar 1996

Beyond morphological size distribution
Sébastien Lefèvre
Journal of Electronic Imaging | VOL. 18
Sébastien LefèvreSébastien Lefèvre
01 Jan 2009
Journal of Electronic Imaging | VOL. 18

Imagine, Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning
Chunpu Xu ... Ying Shen
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Chunpu Xu, et. al.Chunpu Xu ... Ying Shen
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling

Abstract

Talk to us

Similar Papers