Encoded Semantic Tree for Automatic User Profiling Applied to Personalized Video Summarization

Yifang Yin,Roshan Thapliya,Roger Zimmermann

doi:10.1109/tcsvt.2016.2602832

Abstract

We propose an innovative method of automatic video summary generation with personal adaptations. User interests are mined from their personal image collections. To reduce the semantic gap, we propose to extract visual representations based on a novel semantic tree (SeTree). A SeTree is a hierarchy that captures the conceptual relationships between the visual scenes in a codebook. This idea builds upon the observation that such semantic connections among the elements have been overlooked in the previous work. To construct the SeTree , we adopt a normalized graph cut clustering algorithm by conjunctively exploiting visual features, textual information, and social user-image connections. Using this technique, we obtain an 8.1% improvement of normalized discounted cumulative gain in personalized video segments ranking compared with existing methods. Furthermore, to promote the interesting parts of a video, we extract a space–time saliency map and estimate the attractiveness of segments by kernel fitting and matching. A linear function is utilized to combine the two factors, based on which the playback rate of a video is adapted to generate the summary. We play the less important segments in a fast-forward mode to keep users updated with the context. Subjective experiments were conducted which showed that our proposed video summarization approach outperformed the state-of-the-art techniques by 6.2%.

Full Text