Abstract Human matting refers to extracting human parts from natural images with high quality, including human detail information such as hair, glasses, hats, etc. This technology plays an essential role in image synthesis and visual effects in the film industry. When the green screen is not available, the existing human matting methods need the help of additional inputs (such as trimap, background image, etc.), or the model with high computational cost and complex network structure, which brings great difficulties to the application of human matting in practice. To alleviate such problems, we use a segmentation network as the foundation and use multiple branches to achieve human segmentation, contour detail extraction, and information fusion. We also propose a foreground probability map module, which uses the feature maps in the segmentation network to pre-estimate the foreground probabilities of each pixel and obtain Semantic Guided Matting Net. Under the condition that only a single image is needed as the input, the human matting task can be realized by making full use of the semantic information in the image. We validate our method on the P3M-10k dataset. Compared with the benchmark, our method has made significant improvements in various evaluation indicators.
Read full abstract