Abstract
Recent research has shown that modeling the dynamic joint features of the human body by a graph convolutional network (GCN) is a groundbreaking approach for skeleton-based action recognition, especially for the recognition of the body-motion, human-object and human-human interactions. Nevertheless, how to model and utilize coherent skeleton information comprehensively is still an open problem. In order to capture the rich spatiotemporal information and utilize features more effectively, we introduce a spatial residual layer and a dense connection block enhanced spatial temporal graph convolutional network. More specifically, our work introduces three aspects. Firstly, we extend spatial graph convolution to spatial temporal graph convolution of cross-domain residual to extract more precise and informative spatiotemporal feature, and reduce the training complexity by feature fusion in the, so-called, spatial residual layer. Secondly, instead of simply superimposing multiple similar layers, we use dense connection to take full advantage of the global information. Thirdly, we combine the above mentioned two components to create a spatial temporal graph convolutional network (ST-GCN), referred to as SDGCN. The proposed graph representation has a new structure. We perform extensive experiments on two large datasets: Kinetics and NTU-RGB+D. Our method achieves a great improvement in performance compared to the mainstream methods. We evaluate our method quantitatively and qualitatively, thus proving its effectiveness.
Paper version not known (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have