Abstract

Single-stage models for multi-person pose estimation have garnered significant attention due to their streamlined approach in generating person position localization and body structure perception in a single pass. These two parts, however, are processed individually by existing methods, leading to suboptimal results, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.</i> , candidates with high confidences for person localization while poor structure estimations. To this end, we propose a simple yet effective approach, namely Structure-guided Person Localization (SPL), jointly leveraging the advantages of the two aspects to solve the multi-person pose estimation problem, with two complementary novelties. First, we propose to incorporate body structure perception to guide person position localization, consequently, we introduce the Structure-guided Center Learning (SCL) to unify the quality of the body structure perception in the displacement map with the confidence of the person existence in the center map, thus achieving more accurate keypoint position localization results even with extreme poses. Second, to facilitate the end-to-end training of SPL, we propose the efficient Agency-based Scale-adaptive Learning (ASL). Specifically, we predict an agency map of the same size as the center map, which focuses on the foreground area and can adaptively adjust the scale size for each central area with the body structure perception confidence. Comprehensive experiments on challenging benchmarks including COCO and CrowdPose clearly verify the superiority of our framework, which achieves new state-of-the-art single-stage multi-person pose estimation results. Specifically, SPL obtains 72.1 AP scores and 69.5 AP scores in COCO test-dev2017 and CrowdPose test set, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call