Abstract

Semi-supervised Action Quality Assessment (AQA) using limited labeled and massive unlabeled samples to achieve high-quality assessment is an attractive but challenging task. The main challenge relies on how to exploit solid and consistent representations of action sequences for building a bridge between labeled and unlabeled samples in the semi-supervised AQA. To address the issue, we propose a Self-supervised sub-Action Parsing Network (SAP-Net) that employs a teacher-student network structure to learn consistent semantic representations between labeled and unlabeled samples for semi-supervised AQA. We perform actor-centric region detection and generate high-quality pseudo-labels in the teacher branch and assists the student branch in learning discriminative action features. We further design a self-supervised sub-action parsing solution to locate and parse fine-grained sub-action sequences. Then, we present the group contrastive learning with pseudo-labels to capture consistent motion-oriented action features in the two branches. We evaluate our proposed SAP-Net on four public datasets: the MTL-AQA, FineDiving, Rhythmic Gymnastics, and FineFS datasets. The experiment results show that our approach outperforms state-of-the-art semi-supervised methods by a significant margin.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.