Character-Oriented Video Summarization With Visual and Textual Cues

Peilun Zhou,Changliang Li,Zhizhuo Yin,Tong Xu,Dong Liu,Enhong Chen,Guangyi Lv

doi:10.1109/tmm.2019.2960594

Abstract

With the booming of content “re-creation” in social media platforms, character-oriented video summary has become a crucial form of user-generated video content. However, artificial extraction could be time-consuming with high missing rate, while traditional techniques on person search may incur heavy burden of computing resources. At the same time, in social media platforms, videos are usually accompanied with rich textual information, e.g., subtitles or bullet-screen comments which provide the multi-view description of videos. Thus, there exists a potential to leverage textual information to enhance the character-oriented video summarization. To that end, in this paper, we propose a novel framework for jointly modeling visual and textual information. Specifically, we first locate characters indiscriminately through detection methods, and then identify these characters via re-identification to extract potential key-frames, in which appropriate source of textual information will be automatically selected and integrated based on the features of specific frame. Finally, key-frames will be aggregated as the character-oriented summarization. Experiments on real-world data sets validate that our solution outperforms several state-of-the-art baselines on both person search and summarization tasks, which prove the effectiveness of our solution on the character-oriented video summarization problem.

Full Text