Beyond 2D: Fusion of Monocular 3D Pose, Motion and Appearance for Human Action Recognition

Wei Lin,Jie Yu

doi:10.23919/fusion43075.2019.9011279

Abstract

Human pose has proved to be an effective representation for action recognition in video. However, traditional 2D features extracted from videos suffer from the high variations caused by viewpoint changes and projection. In this paper, we investigate the recent monocular 3D pose estimation technology for action recognition and perform a cross-modality analysis by comparing 2D, monocular 3D and Kinect 3D poses in terms of action recognition, especially under cross-viewpoint settings. We show that our proposed monocular 3D pose action recognition pipeline achieves superior results even without real depth information as input. Our proposed three-stream fusion of 3D pose, motion and appearance outperforms state-of-the-art methods on sub-JHMDB, Penn Action and NTU RGB+D datasets.

Full Text