Abstract

This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and F are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.

Highlights

  • Nowadays, camera-based human-computer interaction (HCI) is increasingly being applied to intelligent life

  • By using key frame extraction technology to summarize the main content of the video, the calculation of redundant information can be reduced and the real-time performance of human-computer interaction can be improved; it is of great significance to extract key frames from gesture video as an advanced gesture recognition algorithm

  • The key frame extraction method proposed in this paper is called MIESW, which is mainly divided into the following steps: calculate the inter-frame mutual information value to acquire spatial-temporal information, use an adaptive sliding window method to group similar content frames, and apply an entropy-based threshold method optimization grouping

Read more

Summary

Introduction

Camera-based human-computer interaction (HCI) is increasingly being applied to intelligent life. In [3] , the key frames are extracted by using histogram features based on the shots change detecting This method is suitable for video sequences with less shot changes and simple content with fewer redundant frames. Key frame extraction is the premise of indexing and summarization for the video sequence These algorithms either use certain features to distinguish key frames, resulting in inaccurate extraction when the features are insufficient or disturbed by noise, or based on pixel-level motion extraction, which may lead to excessive calculations. To address these problems, we propose a key frame extraction method for gesture video called the Mutual Information and Entropy based adaptive Sliding Window (MIESW). The remaining chapters of the paper are arranged as follows: Section 2 covers related work, Section 3 introduces algorithms and experimental methods, Section 4 conducts the experiment and clarifies results, and Section 5 summarizes the content of the paper and points out future work

Related Work
Key Frames Extraction and Feature Fusion Principle
Entropy and Mutual Information Theory
Improved Adaptive Sliding Window Method to Extract Key Frames
Remove Redundant Frames
Experiment
Experimental Results
Method
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call