Abstract

Data-driven deep learning achieved excellent performance for human action recognition. However, unseen action recognition remains a challenge for most existing neural networks. Because the action categories, collection perspectives, and scenarios considered during data collection are limited. Compared with class-unseen action recognition, view-unseen action recognition in videos is under-explored. This paper proposes view-robust neural networks (VR-Net) to recognize unseen actions in videos. The VR-Net consists of a 3D pose estimation module, skeleton adaptive transformation neural networks, and classification modules. We first extract 3D skeleton models from the video sequence based on existing pose estimation methods. Next, we propose a skeleton representation transformation scheme and achieve it based on Convolutional Neural Networks (VR-CNN) and Graph Neural Networks (VR-GCN), resulting in the optimal skeleton representations. Futhermore, we explore an associate optimization scheme and a fused output method. We evaluate the proposed neural networks on three challenging benchmarks, i.e., NTU RGB-D dataset (NTU), Kinetics-400 dataset, and Human3.6M dataset (H3.6M). The experimental results show that view robust neural networks achieve the top performance compared to state-of-the-art RGB-based and skeleton-based works, such as 93.6% on the NTU (CV) and 94.6% on the Kinetics-400 dataset (Top-5). The proposed neural networks significantly improve the recognition performance for unseen action recognition, such as 86.8% on the H3.6M (View 2).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.