Abstract
Skeleton-based human action recognition is attracting increasing attention and is widely applied in virtual reality, human–computer interaction system and other cases. Nevertheless, the performance of recent methods on actions with similar appearance features is still barely satisfactory due to their inherent weakness in modeling the discriminative temporal dynamics. Besides, previous methods have limitation in mining the global information of action. To this end, a multiple temporal scale aggregation graph convolutional network is proposed. Firstly, taking advantage of the varying temporal resolutions offered by different layers in the graph convolutional network, we develop a multiple temporal scale aggregation module to extract discriminative temporal feature. Secondly, a new skeleton feature representation method termed as relative joint across frames is proposed, which provides more global structure clue than the absolute coordinates. Furthermore, we propose a five-stream structure that comprehensively models complementary features and eventually achieves a significant performance boost. In the empirical experiments, our method shows an improvement of 2.38% and 4.08% compared to the baseline method on the cross-subject evaluation benchmark of NTU-RGB+D 60 and NTU-RGB+D 120, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.