Abstract

Human action recognition is an active and interesting research topic in computer vision and pattern recognition field that is widely used in the real world. We proposed an approach for human activity analysis based on motion energy template (MET), a new high-level representation of video. The main idea for the MET model is that human actions could be expressed as the composition of motion energy acquired in a three-dimensional (3-D) space-time volume by using a filter bank. The motion energies were directly computed from raw video sequences, thus some problems, such as object location and segmentation, etc., are definitely avoided. Another important competitive merit of this MET method is its insensitivity to gender, hair, and clothing. We extract MET features by using the Bhattacharyya coefficient to measure the motion energy similarity between the action template video and the tested video, and then the 3-D max-pooling. Using these features as input to the support vector machine, extensive experiments on two benchmark datasets, Weizmann and KTH, were carried out. Compared with other state-of-the-art approaches, such as variation energy image, dynamic templates and local motion pattern descriptors, the experimental results demonstrate that our MET model is competitive and promising.

Highlights

  • In recent years, automatic capture, analysis and recognition of human actions is a highly active and significant area in the computer vision research field, with plentiful applications both offline and online,[1,2] for instance, video indexing and browsing, automatic surveillance[3] in shopping malls, and smart homes, etc

  • Based on the idea that an action can be considered as a conglomeration of motion energy in a 3-D space-time volume (X-Y-T), which is treated as an “action-space,” we introduce a new high-level semantically rich representation model, which is called motion energy template (MET) model, that is based on the filter bank for human action recognition (HAR)

  • A linear support vector machine (SVM) classifier combined with the MET model defines one novel method for HAR

Read more

Summary

Introduction

Automatic capture, analysis and recognition of human actions is a highly active and significant area in the computer vision research field, with plentiful applications both offline and online,[1,2] for instance, video indexing and browsing, automatic surveillance[3] in shopping malls, and smart homes, etc. Optical flowbased methods, such as histograms of optical flow[7] and motion flow history,[8] are affected by uncontrolled illumination conditions Another important class of action representation is based on gradients, such as histograms of oriented gradients (HOG)[9]. The MET model is obtained directly from video data, so some limitations of classical methods can be avoided, such as foreground/background segmentation, prior learning of actions, motion estimation, human localization and tracking, etc. (2) The SOME volumes are matched to a database of the SOME template volumes at the corresponding spatiotemporal points using the Bhattacharyya coefficient By this means, the similarity volumes of the action template (T) and the unrecognized video (S) are obtained (Sec. 3.2).

Related Work
High-Level Features
Template-Based Method
Action Classification
MET Model
SOME Features Construction for MET Model
Measuring SOME Volumes Similarity
MET Features’ Vector Construction
Experimental Results and Analysis
Action Recognition on Weizmann Dataset
Run Time of MET Method
Varying the Classifiers
Varying Dimensionality Reduction Techniques
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.