Abstract

Video summarization technology extracts key frames or video clips that can most effectively express video content from the original long video, so that the summarized video clips contain the content information that users are most interested in, which is convenient for users to quickly browse and retrieve the video. In this paper, we regard video summarization as a sequence annotation problem. Different from the existing methods using recurrent models, this paper proposes a fully convolutional neural network model combining spatial attention mechanism to solve the video summarization problem. Firstly, the pre-training model is used to extract the cube features of the input video frame, then the cube features are aggregated into the attention vector by the attention mechanism, which is input into the fully convolutional neural network model for binary classification, and then the video summary is generated. Extensive experiments and analysis on two benchmark datasets demonstrate the effectiveness of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call