Abstract

Detecting key frames in videos is a common problem in many applications such as video classification, action recognition and video summarization. These tasks can be performed more efficiently using only a handful of key frames rather than the full video. Existing key frame detection approaches are mostly designed for supervised learning and require manual labelling of key frames in a large corpus of training data to train the models. Labelling requires human annotators from different backgrounds to annotate key frames in videos which is not only expensive and time consuming but also prone to subjective errors and inconsistencies between the labelers. To overcome these problems, we propose an automatic self-supervised method for detecting key frames in a video. Our method comprises a two-stream ConvNet and a novel automatic annotation architecture able to reliably annotate key frames in a video for self-supervised learning of the ConvNet. The proposed ConvNet learns deep appearance and motion features to detect frames that are unique. The trained network is then able to detect key frames in test videos. Extensive experiments on UCF101 human action and video summarization VSUMM datasets demonstrates the effectiveness of our proposed method.

Highlights

  • Videos normally contain 30 frames per second and more information than is required for many computer vision tasks

  • (2) We introduce a novel two-stream ConvNet that is trained in a self-supervised manner, using labels generated from the previous method, to detect key frames in videos in real time

  • We presented a key frame detection deep ConvNets framework which can automatically annotate the key frames in human action videos

Read more

Summary

Introduction

Videos normally contain 30 frames per second and more information than is required for many computer vision tasks. Typical applications of key frame detection are video summarization, action recognition and visual simultaneous localization and mapping. To process all frames requires extensive memory and computational resources. Video summarization itself is the task of finding key frames in a video to summarize the entire video content. We address the problem of automatically annotating and detecting key frames in a video. A video is represented as a sequence of continuous frames and the aim is to automatically annotate a set of frames of interest. We define “interest” is an abstract concept that denotes the frames can be representative of the video content, diverse to reduce the redundancy [1]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call