Abstract

Topic detection based on text reasoning has attracted widespread attention. Existing methods focus on inference based on textual semantic cues. However, each video is described with only a few words, resulting in sparse textual reasoning cues. In this situation, it is difficult to distinguish videos belonging to the same topic, making topic detection for web videos challenging. Fortunately, visual information contains many more detailed cues than textual information, such as colors, scenes, and objects. Cross-media joint reasoning provides more reasoning cues in a complementary manner than textual information. In view of this, this paper extends topic detection based on text reasoning to cross-media reasoning. A novel heterogeneous interactive tensor learning (HITL) method is proposed, which detects topics through cross-media joint inference. After extracting local features of keyframes and textual information, the semantic correlation between visual and textual information is mined by constructing a keyframe-text interaction attention matrix. Then, a joint cue between textual and visual information is constructed in a cross-media heterogeneous interaction tensor space, thereby achieving rich textual cues through cross-media fusion. Finally, semantic features are extracted through cue interaction in tensor space for topic detection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.