Abstract
Recent progress on skeleton-based action recognition has been substantial, benefiting mostly from the explosive development of Graph Convolutional Networks (GCN). However, prevailing GCN-based methods may not effectively capture the global co-occurrence features among joints and the local spatial structure features composed of adjacent bones. They also ignore the effect of channels unrelated to action recognition on model performance. Accordingly, to address these issues, we propose a Global Co-occurrence feature and Local Spatial feature learning model (GCLS) consisting of two branches. The first branch, based on the Vertex Attention Mechanism branch (VAM-branch), captures the global co-occurrence feature of actions effectively; the second, based on the Cross-kernel Feature Fusion branch (CFF-branch), extracts local spatial structure features composed of adjacent bones and restrains the channels unrelated to action recognition. Extensive experiments on two large-scale datasets, NTU-RGB+D and Kinetics, demonstrate that GCLS achieves the best performance when compared to the mainstream approaches.
Highlights
In the field of computer vision, human action recognition plays an important role, with the purpose of predicting the action classes of videos
To solve the above problems, we propose a Global Co-occurrence feature and Local Spatial feature learning model (GCLS), which consists of two branches
We propose a Global Co-occurrence feature and Local Spatial feature learning model (GCLS), which consists of two branches, for skeleton-based action recognition
Summary
In the field of computer vision, human action recognition plays an important role, with the purpose of predicting the action classes of videos. For CFF-branch, first, we analyze the differences of feature fusion process between prevailing GCN-based methods and CNN, so as to obtain the limitations of previous related work.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have