Abstract

Pose-based action recognition has drawn considerable attention recently. Existing methods exploit the joint position to extract body-part features from the activation maps of the backbone CNN to assist human action recognition. However, there are two limitations: (1) the body-part features are independently used or simply concatenated to obtain a representation, where the prior knowledge about the structured correlations between body parts are not fully exploited; (2) the backbone CNN, from which the body-part features are extracted, is “lazy”. It always contents itself with identifying patterns from the most discriminative areas of the input, which causes no information on the features extracted from other areas. This consequently hampers the performance of the followed aggregation process and makes the model easy to be misled by the training data bias. To address these problems, we encode the body-part features into a human-based spatiotemporal graph and employ a light-weight graph convolutional module to explicitly model the dependencies between body parts. Besides, we introduce a novel intermediate dense supervision to promote the backbone CNN to treat all regions equally, which is simple and effective, without extra parameters and computations. The proposed approach, namely, the pose-based graph convolutional network (PGCN), is evaluated on three popular benchmarks, where our approach significantly outperforms the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call