Abstract

We present a novel method for human action recognition (HAR) based on estimated poses from image sequences. We use 3D human pose data as additional information and propose a compact human pose representation, called a weak pose, in a low-dimensional space while still keeping the most discriminative information for a given pose. With predicted poses from image features, we map the problem from image feature space to pose space, where a Bag of Poses (BOP) model is learned for the final goal of HAR. The BOP model is a modified version of the classical bag of words pipeline by building the vocabulary based on the most representative weak poses for a given action. Compared with the standard k-means clustering, our vocabulary selection criteria is proven to be more efficient and robust against the inherent challenges of action recognition. Moreover, since for action recognition the ordering of the poses is discriminative, the BOP model incorporates temporal information: in essence, groups of consecutive poses are considered together when computing the vocabulary and assignment. We tested our method on two well-known datasets: HumanEva and IXMAS, to demonstrate that weak poses aid to improve action recognition accuracies. The proposed method is scene-independent and is comparable with the state-of-art method.

Highlights

  • Human action recognition (HAR) is an important problem in computer vision

  • The rest of the article is organized as follows: section introduces our human body model and human posture representation; Section Weak pose estimation using Gaussian Process Regression (GPR) describes how we use a set of Gaussian processes for learning the mapping from 2D image features to 3D human poses; in Section Bag of Poses (BOP) for action recognition, we describe a procedure for incorporating temporal information in a bag of words (BOW) schema, showing the results in Section Experimental results

  • In this article we have proposed a novel approach to action recognition using a BOP model with weak poses estimated from silhouettes

Read more

Summary

Introduction

Human action recognition (HAR) is an important problem in computer vision. Application fields include video surveillance, automatic video indexing and human computer interaction. One can categorize the scenarios found in the literature into several groups: single-human action [1], crowds [2], human-human interaction [3], and action recognition in aerial views [4], to cite but a few. Points [5,6], temporal templates [7], 3D SIFT [8], optical flow [9,10], Motion History Volume [11], among others. These features are commonly used to describe human actions which are subsequently classified using techniques

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call