Abstract

Human pose estimation has been widely studied with much focus on supervised learning. However, in real applications, a pretrained pose estimation model usually needs be adapted to a novel domain without labels or with sparse labels. Existing domain adaptation methods cannot well deal with it since poses have flexible topological structures and need fine-grained local features. Aiming at the characteristics of human pose, we propose a novel domain adaptation method for multi-person pose estimation (MPPE) to alleviate the human-level shift. Firstly, the training samples of human poses are clustered into groups according to the posture similarity. Within the clustered space, we conduct three adaptation modules: Cross-Attentive Feature Alignment (CAFA), Intra-domain Structure Adaptation (ISA) and Adaptive Human-Topology Adaptation (AHTA). The CAFA adopts a bidirectional spatial attention mechanism to explore fine-grained local feature correlation between two humans, and thus to adaptively aggregate consistent features for adaptation. ISA only works in semi-supervised domain adaptation (SSDA) to exploit semantic relationship of corresponding keypoints for reducing the intra-domain bias. Importantly, we creatively propose an AHTA to enrich human topological knowledge for reducing the inter-domain discrepancy. Specifically, the pose structure and the cross-instance topological relations are modeled via graph networks. This flexible topology learning benefits the occluded or extreme pose inference. Extensive experiments are conducted on two popular benchmarks and additional two challenging datasets. Results demonstrate the competency of our method, which works in unsupervised or semi-supervised modes, compared with the existing supervised approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call