Abstract

Existing human matting methods are incapable of accurately estimating the alpha mattes of arbitrarily selected humans from a group photo. An alternative solution is to apply them to the corresponding cropped image patches. However, this option obtains an inaccurate alpha estimation due to the interference of the body parts of the neighboring humans. In addition, these methods are only trained on finely annotated synthetic data, which causes poor performance in real-world scenarios due to the domain shift. To address these problems, we propose human selective matting (HSMatt), which performs matting for arbitrarily selected humans from a group photo given only a simple bounding box as guidance. Specifically, we design a global–local context network to extract both local and global semantic context features. A human-aware trimap network is then proposed to generate human-aware trimaps for the selected humans, which adopts stacked bidirectional inference modules with intermediate supervision to progressively refine the estimated trimap. Finally, a partially supervised matting network is introduced to estimate the alpha matte, which uses a sample-varying loss to train the network on both the finely annotated synthetic data and coarsely annotated real-world data, resulting in high accuracy and good generalization. To evaluate the proposed HSMatt, we construct the first human selective matting dataset, named HSM-200K, which contains over 200,000 human images with instance-level alpha matte annotations. Experimental results demonstrate that the proposed HSMatt outperforms state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call