Abstract

Human action recognition based on skeleton recently has attracted attention of researchers due to the accessibility and popularity of the 3D skeleton data. However, it is complicated to effectively represent spatial–temporal skeleton sequences given the large variations of action representations when they are captured from different viewpoints. In order to get a better representation of the spatial–temporal skeletal features, this paper introduces a view transform graph attention recurrent networks (VT+GARN) method for view-invariant human action recognition. We design a view-invariant transform strategy based on the sequence to reduce the influence of different views on the spatial–temporal position of skeleton joint. Then, the graph attention recurrent network automatically calculates the coefficient of attention and learns the representation of spatiotemporal skeletal features after the transformation and outputs the classification result. Ablation studies and extensive experiments on three challenging datasets, Northwestern-UCLA, NTU RGB+D and UWA3DII, demonstrate the effectiveness and superiority of our method

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.