End-to-End Instance-Level Human Parsing by Segmenting Persons

Zhuang Li,Hongbin Wang,Leilei Cao,Lihong Xu

doi:10.1109/tmm.2023.3260631

Abstract

Instance-level human parsing is aimed at separately partitioning the human body into different semantic parts for each individual, which remains a challenging task due to human appearance/pose variation, occlusion and complex backgrounds. Most state-of-the-art methods follow the “parsing-by-detection” paradigm, which relies on a trained detector to localize persons and then sequentially performs single-person parsing for each person. However, this paradigm is closely related to the detector, and the runtime is proportional to the number of persons in an image. In this paper, we present a novel detection-free framework for instance-level human parsing in an end-to-end manner. We decompose instance-level human parsing into two subtasks via a unified network: 1) semantic segmentation for pixel-level classification as a human part and 2) instance segmentation for mask-level classification as a person. The framework can directly predict the human-part semantic mask for all persons and binary masks for instance-level persons in parallel. The parsing result of each person can be acquired via a Hadamard product between the human-part semantic mask and the corresponding person's binary mask. Extensive experiments demonstrate that our proposed method performs favorably against state-of-the-art methods on the CIHP and MHP v2 datasets.

Full Text