GVSUM: generic video summarization using deep visual features

Madhushree Basavarajaiah,Priyanka Sharma

doi:10.1007/s11042-020-10460-0

Abstract

Video Summarization is the method of producing a summary of the video content. A generic video summarization method named GVSUM is proposed in this paper. The generic summary is generated by choosing keyframes whenever a major scene change occurs in the video. All frames of the video are assigned a cluster number based on their visual features and the keyframes are extracted when the cluster number of the frame changes. Visual features of the video are extracted from a pre-trained Convolutional Neural Network (CNN) and then k-means clustering is applied on these features followed by a sequential keyframe generation technique. However, the optimum value of number of clusters can also be chosen before summarizing by applying Average Silhouette Width method. Mean Opinion Scores (MOS) of the summaries generated show that the GVSUM approach gives satisfactory results for a generic video summarization as it picks up a frame wherever the the visual content changes. The quantitative F1 measure also shows promising results.

Full Text