Effective Local-Global Transformer for Natural Image Matting

Liangpeng Hu,Jide Li,Xiaoqiang Li,Yating Kong

doi:10.1109/tcsvt.2023.3234983

Liangpeng Hu, Jide Li + Show 2 more

https://doi.org/10.1109/tcsvt.2023.3234983

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Learning-based matting methods have been dominated by convolution neural networks for a long time. These methods mainly propagate the alpha matte according to the similarity between unknown and known regions. However, correlations between pixels in unknown and known regions are limited due to the insufficient receptive fields of common convolution neural networks, which leads to inaccurate estimation for pixels in unknown regions that are far away from known regions. In this paper, we propose an Effective Local-Global Transformer for natural image matting (ELGT-Matting), which can further expand receptive fields to establish a wide range of correlations between unknown and known regions. The kernel module is the effective local-global transformer block, and each block consists of two modules: 1) A Window-Level Global MSA (Multi-head Self-Attention) module, which learns global context features among windows. 2) A Local-Global Window MSA, which combines coarse global context features and corresponding fine local window features to help local window self-attention capture both local and context information. Experiments demonstrate that our ELGT-Matting performs outstandingly against other competitive approaches on Composition-1K, Distinctions-646, and real-world AIM-500 datasets. In particular, we achieve a new SOTA result on Composition-1K with MSE 0.00374.

Full Text