Does Human Action Recognition Benefit from Pose Estimation?

Angela Yao,Juergen Gall,Luc Van Gool,Gabriele Fanelli

doi:10.5244/c.25.67

Abstract

Early works on human action recognition focused on tracking and classifying articulated body motions. Such methods required accurate localisation of body parts, which is a difficult task, particularly under realistic imaging conditions. As such, recent trends have shifted towards the use of more abstract, low-level appearance features such as spatio-temporal interest points. Motivated by the recent progress in pose estimation, we feel that pose-based action recognition systems warrant a second look. In this paper, we address the question of whether pose estimation is useful for action recognition or if it is better to train a classifier only on low-level appearance features drawn from video data. We compare pose-based, appearance-based and combined pose and appearance features for action recognition in a home-monitoring scenario. Our experiments show that posebased features outperform low-level appearance features, even when heavily corrupted by noise, suggesting that pose estimation is beneficial for the action recognition task.

Full Text