Self-Attention Recurrent Summarization Network with Reinforcement Learning for Video Summarization Task

Aniwat Phaphuangwittayakul,Wentian Xu,Zheng Zheng,Fangli Ying,Yi Guo

doi:10.1109/icme51207.2021.9428142

Abstract

With the exponential growth of video data, video summarization techniques are urgently needed for reducing people’s efforts in the videos' content exploration by generating succinct but informative summaries from original lengthy videos. Though supervised video summarization approaches have demonstrated the state-of-the-art performance, unsupervised methods are still highly demanded due to resourcefully expensive human annotations and the subjectiveness of video summarization tasks. In this paper, a novel unsupervised-based Deep Self-attention Recurrent summarization network with Reinforcement Learning (DSR-RL) for video summarization is proposed. The model can learn the input video sequence and suggest the key-shot summary without additional human annotations by integrating self-attention, BRNN, and reinforcement learning mechanisms. The DSR-RL improves not only importance score through the attention map vector of self-attention network but also the diversity of summaries via the reward function of reinforcement learning. Our method outperforms the state-of-the-art unsupervised video summarization methods on both SumMe and TVSum datasets. The source code is available at https://github.com/phaphuang/DSR-RL.

Full Text