Abstract

Learning-based matting methods have been dominated by convolution neural networks for a long time. These methods mainly propagate the alpha matte according to the similarity between unknown and known regions. However, correlations between pixels in unknown and known regions are limited due to the insufficient receptive fields of common convolution neural networks, which leads to inaccurate estimation for pixels in unknown regions that are far away from known regions. In this paper, we propose an Effective Local-Global Transformer for natural image matting (ELGT-Matting), which can further expand receptive fields to establish a wide range of correlations between unknown and known regions. The kernel module is the effective local-global transformer block, and each block consists of two modules: 1) A Window-Level Global MSA (Multi-head Self-Attention) module, which learns global context features among windows. 2) A Local-Global Window MSA, which combines coarse global context features and corresponding fine local window features to help local window self-attention capture both local and context information. Experiments demonstrate that our ELGT-Matting performs outstandingly against other competitive approaches on Composition-1K, Distinctions-646, and real-world AIM-500 datasets. In particular, we achieve a new SOTA result on Composition-1K with MSE 0.00374.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call