Abstract

Human instance segmentation is a core problem for human-centric scene understanding and segmenting human instances poses a unique challenge to vision systems due to large intra-class variations in both appearance and shape, and complicated occlusion patterns. In this paper, we propose a new pose-aware human instance segmentation method. Compared to the previous pose-aware methods which first predict bottom-up poses and then estimate instance segmentation on top of predicted poses, our method integrates both top-down and bottom-up cues for an instance: it adopts detection results as human proposals and jointly estimates human pose and instance segmentation for each proposal. We develop a modular recurrent deep network that utilizes pose estimation to refine instance segmentation in an iterative manner. Our refinement modules exploit pose cues in two levels: as a coarse shape prior and local part attention. We evaluate our approach on two public multi-person benchmarks: OCHuman dataset and COCOPersons dataset. The proposed method surpasses the state-of-the-art methods on OCHuman dataset by 3.0 mAP and on COCOPersons by 6.4 mAP, demonstrating the effectiveness of our approach.

Highlights

  • Semantic instance segmentation aims to achieve pixel level object category classification and object instance grouping simultaneously

  • The challenges of human segmentation mainly originates from two properties of human category: firstly, human instances have large intra-class variations in both appearance and shape, due to their clothing and highly deformable body; as human individuals typically interact with other objects in their daily activities, they produces very complex visual patterns, in particular,occlusions between

  • We evaluate our approach on two public multi-person benchmarks, OCHuman [1] and COCOPersons [12] dataset

Read more

Summary

Introduction

Semantic instance segmentation aims to achieve pixel level object category classification and object instance grouping simultaneously. A key reason is that instance grouping is ambiguous given a box proposal when multiple object instances are heavily overlapped inside the box region To tackle this problem, recently Zhang et al [1] propose a bottom-up approach that utilizes estimated human poses to generate instance proposals and performs human segmentation for each proposal. Recently Zhang et al [1] propose a bottom-up approach that utilizes estimated human poses to generate instance proposals and performs human segmentation for each proposal It leverages the grouping of sparse keypoints as an intermediate stage to mitigate the problem of ambiguity in dense segmentation. On the OCHuman benchmark, the bottom-up pose estimator [10] only produces 54.1% in recall rate, while

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call