Abstract
A fundamental issue in skeleton-based action recognition is the extraction of useful features from skeleton joints. Unfortunately, the current state-of-the-art models for this task have a tendency to be overly complex and parameterized, which results in low model training and inference time efficiency for large-scale datasets. In this work, we develop a simple but yet an efficient baseline for skeleton-based Human Action Recognition (HAR). The architecture is based on adaptive GCNs (Graph Convolutional Networks) to capture the complex interconnections within skeletal structures automatically without the need of a predefined topology. The GCNs are followed and empowered with an attention mechanism to learn more informative representations. This paper reports interesting accuracy on a large-scale dataset NTU-RGB+D 60, 89.7% and 95.0% on respectively Cross-Subject, and Cross-View benchmarks. On NTU-RGB+D 120, 84.6% and 85.8% over Cross-Subject and Cross-Setup settings, respectively. This work provides an improvement of the existing model SGN (Semantic-Guided Neural Networks) when extracting more discriminant spatial and temporal features.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Visual Communication and Image Representation
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.