This paper presents an approach to simultaneously tracking the pose and recognizing human actions in a video. This is achieved by combining a Dynamic Bayesian Action Network (DBAN) with 2D body part models. Existing DBAN implementation relies on fairly weak observation features, which affects the recognition accuracy. In this work, we use a 2D body part model for accurate pose alignment, which in turn improves both pose estimate and action recognition accuracy. To compensate for the additional time required for alignment, we use an action entropy-based scheme to determine the minimum number of states to be maintained in each frame while avoiding sample impoverishment. In addition, we also present an approach to automation of the keypose selection task for learning 3D action models from a few annotations. We demonstrate our approach on a hand gesture dataset with 500 action sequences, and we show that compared to DBAN our algorithm achieves 6% improvement in accuracy.
Read full abstract