Abstract

We propose an innovative approach for human activity recognition based on affine-invariant shape representation and SVM-based feature classification. In this approach, a compact computationally efficient affine-invariant representation of action shapes is developed by using affine moment invariants. Dynamic affine invariants are derived from the 3D spatiotemporal action volume and the average image created from the 3D volume and classified by an SVM classifier. On two standard benchmark action datasets (KTH and Weizmann datasets), the approach yields promising results that compare favorably with those previously reported in the literature, while maintaining real-time performance.

Highlights

  • Visual recognition and interpretation of human-induced actions and events are among the most active research areas in computer vision, pattern recognition, and image understanding communities [1]

  • Support Vector Machines (SVMs) with Gaussian radial basis function (RBF) kernel are trained on the training set, while the evaluation of the recognition performance is performed on the test set

  • As with the first experiment, SVMs with Gaussian RBF kernel are trained on the training set, while the evaluation of the recognition performance is performed on the test set

Read more

Summary

Introduction

Visual recognition and interpretation of human-induced actions and events are among the most active research areas in computer vision, pattern recognition, and image understanding communities [1]. A great deal of progress has been made in automatic recognition of human actions during the last two decades, the approaches proposed in the literature remain limited in their ability. This leads to a need for much research work to be conducted to address the ongoing challenges and develop more efficient approaches. While the real-time performance is a major concern in computer vision, especially for embedded computer vision systems, the majority of state-of-the-art human action recognition systems often employ sophisticated feature extraction and learning techniques, creating a barrier to the real-time performance of these systems. By applying a threshold T (set to 0.6 in our experiments), the background distribution remains on top with the lowest variance, where

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call