Abstract

Video summarization aims to provide a condensed yet informative version for original footages so as to facilitate content comprehension, browsing and delivery, where multi-modal features play an important role in differentiating individual segments of a video. In this paper, we present a method combining both visual and semantic features. Rather than utilize domain specific or heuristic textual features as semantic features, we assign semantic concepts to video segments through automatic video annotation. Therefore, semantic coherence between accompanying text and high-level concepts of video segments is exploited to characterize the importance of video segments. Visual features (e.g. motion and face) which have been widely used in user attention model-based summarization have been integrated with the proposed semantic coherence to obtain the final summarization. Experiments on a half-hour sample video from TRECVID 2006 dataset have been conducted to demonstrate that semantic coherence is very helpful for video summarization when being fused with different visual features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call