Abstract

Benefited from the powerful ability of spatial temporal Graph Convolutional Networks (ST-GCNs), skeleton-based human action recognition has gained promising success. However, the node interaction through message propagation does not always provide complementary information. Instead, it May even produce destructive noise and thus make learned representations indistinguishable. Inevitably, the graph representation would also become over-smoothing especially when multiple GCN layers are stacked. This paper proposes spatial-temporal graph deconvolutional networks (ST-GDNs), a novel and flexible graph deconvolution technique, to alleviate this issue. At its core, this method provides a better message aggregation by removing the embedding redundancy of the input graphs from either node-wise, frame-wise or element-wise at different network layers. Extensive experiments on three current most challenging benchmarks verify that ST-GDN consistently improves the performance and largely reduce the model size on these datasets.

Highlights

  • R ECENT years, skeleton-based action recognition has attracted great attention since the data is more compact and more robust to complex background, when compared to RGB inputs [1]–[6]

  • Deep neural models, including both convolutional neural networks (CNNs) [1], [7]–[10] and Recurrent Neural Networks (RNNs) [4], [11], [12], achieve promising results and become mainstream methods since they are able to automatically learn more distinguishable features from data. Just like for another irregular data, conventional neural networks like CNNs and RNNs are designed in Euclidean space the skeleton-based action recognition does not significantly benefit from the neural networks

  • We propose a novel graph neural architecture, referred as spatial temporal graph deconvolutional network (STGDN), to deal with the aforementioned issue

Read more

Summary

INTRODUCTION

R ECENT years, skeleton-based action recognition has attracted great attention since the data is more compact and more robust to complex background, when compared to RGB inputs [1]–[6]. Yan et al first proposed to use spatial-temporal GCN for this task [14] and it becomes one of the most common framework to skeleton-based action. We evaluate our model on three current most challenging skeleton-based human action recognition tasks. PENG et al.: SPATIAL TEMPORAL GRAPH DECONVOLUTIONAL NETWORK FOR SKELETON-BASED HUMAN ACTION RECOGNITION r We provide various ST-GCNs from different levels. By combining them at different layers, we present a brandnew graph neural network, named ST-GDN, which could largely reduce the model parameters as well as improve the r feature representative ability. We utilize this model to deal with skeleton-based human action recognition tasks. The results on three current most challenging datasets show that we can get the best performance on any given evaluation metrics with an efficient fashion

PROPOSED APPROACH
EXPERIMENTS
Experiment Settings
Visualization
Ablation Experiments
Comparison With the State-of-the-Art Methods
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.