Abstract
Visual story generation, which involves generating short stories from sequential images, has become a core task at the intersection of computer vision and natural language processing. However, existing methods suffer from a bias in the concept predicates predicted, leading to a semantic gap between the generated stories and the images. This paper proposes a novel visual story generation model that utilizes muliti granularity image information to guide the generation process and correct the bias in concept predicates, resulting in more image-consistent stories. The proposed model consists of two stages: In the first stage, a set of concepts predicates is predicted from the image and enriched with external knowledge, and the most suitable concepts for story generation are selected. In the second stage, fine-grained image information are utilized to integrate image information into the story generation module, improving the bias in concept predicates. The image theme information and the generated results of previous moments are used as prompts to guide the story generation module. Experimental results show that the proposed model outperforms baseline models in all evaluation metrics. Specifically, the BLEU-1, BLEU-2, BLEU-3, and BLEU-4 metrics are improved by 4.0, 3.8, 3.02, and 1.98 percentage points, respectively, and the METEOR metric is improved by 1.4 percentage points. The generated stories are more consistent with the image content, maintain a consistent theme, and enhance coherence between contexts.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have