Abstract

In this paper we present an instant action recognition method, which is able to recognize an action in real-time from only two continuous video frames. For the sake of instantaneity, we employ two types of computationally efficient but perceptually important features – optical flow and edges – to capture motion and shape characteristics of actions. It is known that the two types of features can be unreliable or ambiguous due to noise and degradation of video quality. In order to endow them with strong discriminative power, we pursue combined features, of which the joint distributions are different in-between action classes. As the low-level visual features are usually densely distributed in video frames, to reduce computational expense and induce a compact structural representation, we propose to first group the learned discriminative joint features into feature groups according to their correlation, then adapt the efficient boosting method as the action recognition engine which take the grouped features as input. Experimental results show that the combination of the two types of features achieves superior performance in differentiating actions than that of using each single type of features alone. The whole model is computationally efficient, and the action recognition accuracy is comparable to the state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call