Multi-task model for human pose estimation and person detection

Daiwei Yu,Zhao Jin,Guanqun Li,Wenjin Zhang,Jun Zhang,Deze Zeng,Yi Xie,Wenbing Tao,Xudong Jiang

doi:10.1117/12.2644479

Abstract

Human pose estimation and person detection are two fundamental tasks of human behavior analysis. There has been remarkable progress in these two tasks separately since the development of convolutional neural network. Recently, researchers have paid more attention to one-stage human pose estimation and person detection for the needs of practical application. However, few researches have been reported on completing these two tasks in a single network simultaneously. There are two main reasons: (1) designing an effective mechanism that makes full use of their relevance and complementation to achieve common progress, especially the pose estimation accuracy is really challenging, (2) evaluation bias caused by scale sensitivity difference remains unsolved. To address these problems, we propose a multi-task model for human pose estimation and person detection simultaneously, named PersonPD (person pose and person detection). It predicts keypoint heatmaps and regresses a 4D relative displacement vector (l,t,r,b) which actually encodes the person bounding box and also acts as keypoints' grouping clues. A maximum IOU matching algorithm, named IOU-grouping, is presented to group body joints into individual persons. At the same time, it generates accurate person detection results. In this simple but effective method, our model get competitive person detection and pose estimation performance on COCO datasets<sup>1</sup>.

Full Text