Comparative Analysis of Keyframe Extraction Techniques for Video Summarization

Vishal Parikh,Saumyaa Shah,Priyanka Sharma,Jay Mehta

doi:10.2174/2666255813999200710131444

Vishal Parikh, Saumyaa Shah + Show 2 more

https://doi.org/10.2174/2666255813999200710131444

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Background: With the technological advancement, the quality of life of a human were improved. Also with the technological advancement large amount of data were produced by human. The data is in the forms of text, images and videos. Hence there is a need for significant efforts and means of devising methodologies for analyzing and summarizing them to manage with the space constraints. Video summaries can be generated either by keyframes or by skim/shot. The keyframe extraction is done based on deep learning based object detection techniques. Various object detection algorithms have been reviewed for generating and selecting the best possible frames as keyframes. A set of frames were extracted out of the original video sequence and based on the technique used, one or more frames of the set are decided as a keyframe, which then becomes the part of the summarized video. The following paper discusses the selection of various keyframe extraction techniques in detail. Methods : The research paper is focused at summary generation for office surveillance videos. The major focus for the summary generation is based on various keyframe extraction techniques. For the same various training models like Mobilenet, SSD, and YOLO were used. A comparative analysis of the efficiency for the same showed YOLO giving better performance as compared to the others. Keyframe selection techniques like sufficient content change, maximum frame coverage, minimum correlation, curve simplification, and clustering based on human presence in the frame have been implemented. Results: Variable and fixed length video summaries were generated and analyzed for each keyframe selection techniques for office surveillance videos. The analysis shows that he output video obtained after using the Clustering and the Curve Simplification approaches is compressed to half the size of the actual video but requires considerably less storage space. The technique depending on the change of frame content between consecutive frames for keyframe selection produces the best output for office room scenarios. The technique depending on frame content between consecutive frames for keyframe selection produces the best output for office surveillance videos. Conclusion: In this paper, we discussed the process of generating a synopsis of a video to highlight the important portions and discard the trivial and redundant parts. First, we have described various object detection algorithms like YOLO and SSD, used in conjunction with neural networks like MobileNet to obtain the probabilistic score of an object that is present in the video. These algorithms generate the probability of a person being a part of the image, for every frame in the input video. The results of object detection are passed to keyframe extraction algorithms to obtain the summarized video. From our comparative analysis for keyframe selection techniques for office videos will help in determining which keyframe selection technique is preferable.

Full Text