View-normalized Skeleton Generation for Action Recognition

Qingzhe Pan,Guangming Shi,Zhifu Zhao,Yuhan Cao,Xuemei Xie,Jianan Li

doi:10.1145/3474085.3475341

Abstract

Skeleton-based action recognition has attracted great interest due to low cost of skeleton data acquisition and high robustness to external conditions. A challenging problem of skeleton-based action recognition is the large intra-class gap caused by various viewpoints of skeleton data, which makes the action modeling difficult for network. To alleviate this problem, a feasible solution is to utilize label supervised methods to learn a view-normalization model. However, since the skeleton data in real scenes is acquired from diverse viewpoints, it is difficult to obtain the corresponding view-normalized skeleton as label. Therefore, how to learn a view-normalization model without the supervised label is the key to solving view-variance problem. To this end, we propose a view normalization-based action recognition framework, which is composed of view-normalization generative adversarial network (VN-GAN) and classification network. For VN-GAN, the model is designed to learn the mapping from diverse-view distribution to normalized-view distribution. In detail, it is implemented by graph convolution, where the generator predicts the transformation angles for view normalization and discriminator classifies the real input samples from the generated ones. For classification network, view-normalized data is processed to predict the action class. Without the interference of view variances, classification network can extract more discriminative feature of action. Furthermore, by combining the joint and bone modalities, the proposed method reaches the state-of-the-art performance on NTU RGB+D and NTU-120 RGB+D datasets. Especially in NTU-120 RGB+D, the accuracy is improved by 3.2% and 2.3% under cross-subject and cross-set criteria, respectively.

Full Text