Abstract

Video paragraph captioning aims to generate multiple descriptive sentences for videos, which strive to replicate human writing in accuracy, logicality, and richness. However, current research focuses on the accuracy and temporal order of events, ignoring emotion and other critical logical relations embedded in human language, such as causal and adversative relations. The ignorance impairs the reasonable transition across generated event descriptions and restricts the vividness of expression, resulting in a gap from the standard of human language. To resolve the problem, a framework that integrates logic and emotion representation learning is proposed to narrow the gap. Concretely, a large-scale inter-event relation corpus is constructed based on the EMVPC dataset. This corpus is named EMVPC-EvtRel (standing for “EMVPC-Event Relations”) and contains six widely-used logical relations in human writing, 127 explicit inter-sentence connectives, and over 20,000 pairs of event segments with newly annotated logical relations. A logical semantic representation learning method is developed for recognizing the dependencies between visual events, thereby enhancing the characteristics of video contents and boosting the logicality of generated paragraphs. Moreover, a fine-grained emotion recognition module is designed to uncover emotion features embedded in videos. Finally, experimental results on the EMVPC dataset demonstrate the superiority of the proposed method compared to existing state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.