Effective Video Summarization by Extracting Parameter-Free Motion Attention

Tingting Han,Jianhui Zhang,Zhou Yu,Jun Yu,Quan Zhou,Sicheng Zhao

doi:10.1145/3654670

Abstract

Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information that cannot be captured by frame-level video representations. In this article, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially contained in adjacent video frames, using a multi-head attention architecture. The PMAM requires no additional training for model parameters, leading to an efficient and effective understanding of video dynamics. Moreover, we introduce the Multi-feature Motion Attention Network (MMAN), integrating the PMAM with local and global multi-head attention based on object-centric and scene-centric video representations. The synergistic combination of local motion information, extracted by the proposed PMAM, with long-range interactions modeled by the local and global multi-head attention mechanism, can significantly enhance the performance of video summarization. Extensive experimental results on the benchmark datasets, SumMe and TVSum, demonstrate that the proposed MMAN outperforms other state-of-the-art methods, resulting in remarkable performance gains.

Full Text