Abstract

Self-supervised contrastive learning has been widely applied to skeleton-based action recognition due to its ability to learn discriminative features. However, directly applying the existing contrastive learning framework for 3D skeleton learning is limited by the well-designed augmentations and the simple multi-stream decision-level fusion. To deal with these drawbacks, we propose a three-stream contrastive learning framework utilizing abundant information mining for self-supervised action representation (3s-AimCLR++). For single-stream contrastive learning, extreme augmentation is first proposed to generate more movement patterns, which can introduce more movement patterns to improve the universality of the learned representations. Since directly using extreme augmentation can barely boost the performance due to the drastic changes in original identity, the Distributional Divergence Minimization (DDM) loss is proposed to utilize the extreme augmentation more gently. Moreover, the Single-Stream Nearest Neighbors Mining (SNNM) is proposed to expand positive samples to make the learning process more reasonable. For multi-stream, existing methods simply ensemble the results. Yet, considering the complementarity of information between different streams, we propose Multi-Stream Aggregation and Interaction (MSAI) strategy to better fuse multi-stream information. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets have verified that our 3s-AimCLR++ can significantly perform favorably against state-of-the-art methods under a variety of evaluation protocols. The code and models are available at https://github.com/Levigty/AimCLR-v2.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.