Abstract

In automatic video summarization, visual summary is constructed typically based on the analysis of low-level features with little consideration of video semantics. However, the contextual and semantic information of a video is marginally related to low-level features in practice although they are useful to compute visual similarity between frames. Therefore, we propose a novel video summarization technique, where the semantically important information is extracted from a set of keyframes given by human and the summary of a video is constructed based on the automatic temporal segmentation using the analysis of inter-frame similarity to the keyframes. Toward this goal, we model a video sequence with a dissimilarity matrix based on bidirectional similarity measure between every pair of frames, and subsequently characterize the structure of the video by a nonlinear manifold embedding. Then, we formulate video summarization as a variant of the 0-1 knapsack problem, which is solved by dynamic programming efficiently. The effectiveness of our algorithm is illustrated quantitatively and qualitatively using realistic videos collected from YouTube.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call