Contextual Instance Decoupling for Instance-Level Human Analysis

Dongkai Wang,Shiliang Zhang

doi:10.1109/tpami.2023.3243223

Abstract

One fundamental challenge of instance-level human analysis is to decouple instances in crowded scenes, where multiple persons are overlapped with each other. This paper proposes the Contextual Instance Decoupling (CID), which presents a new pipeline of decoupling persons for multi-person instance-level analysis. Instead of relying on person bounding boxes to spatially differentiate persons, CID decouples persons in an image into multiple instance-aware feature maps. Each of those feature maps is hence adopted to infer instance-level cues for a specific person, e.g., keypoints, instance mask or part segmentation masks. Compared with bounding box detection, CID is differentiable and robust to detection errors. Decoupling persons into different feature maps also allows to isolate distractions from other persons, and explore context cues at scales larger than the bounding box size. Extensive experiments on various tasks including multi-person pose estimation, person foreground segmentation, and part segmentation, show that CID consistently outperforms previous methods in both accuracy and efficiency. For instance, it achieves 71.3% AP on CrowdPose in multi-person pose estimation, outperforming the recent single-stage DEKR by 5.6%, the bottom-up CenterAttention by 3.7%, and the top-down JC-SPPE by 5.3%. This advantage sustains on multi-person segmentation and part segmentation tasks.

Full Text