Abstract
We focus on unsupervised representation learning for skeleton based action recognition. Existing unsupervised approaches usually learn action representations by motion prediction but they lack the ability to fully learn inherent semantic similarity. In this paper, we propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP) to address this challenge. Different from plain motion prediction, PCRP performs reverse motion prediction based on encoder-decoder structure to extract more discriminative temporal pattern, and derives action prototypes by clustering to explore the inherent action similarity within the action encoding. Specifically, we regard action prototypes as latent variables and formulate PCRP as an expectation-maximization (EM) task. PCRP iteratively runs (1) E-step as to determine the distribution of action prototypes by clustering action encoding from the encoder while estimating concentration around prototypes, and (2) M-step as optimizing the model by minimizing the proposed ProtoMAE loss, which helps simultaneously pull the action encoding closer to its assigned prototype by contrastive learning and perform reverse motion prediction task. Besides, the sorting can also serve as a temporal task similar as reverse prediction in the proposed framework. Extensive experiments on N-UCLA, NTU 60, and NTU 120 dataset present that PCRP outperforms main stream unsupervised methods and even achieves superior performance over many supervised methods. The codes are available at: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/LZUSIAT/PCRP</uri> .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.