RawFormer: An Efficient Vision Transformer for Low-Light RAW Image Enhancement

Xingbo Dong,Andrew Beng Jin Teoh,Zhixian Lin,Lan Ma,Wanyan Xu

doi:10.1109/lsp.2022.3233005

Xingbo Dong, Andrew Beng Jin Teoh + Show 3 more

https://doi.org/10.1109/lsp.2022.3233005

Copy DOI

Abstract

Low-light image enhancement plays a central role in various downstream computer vision tasks. Vision Transformers (ViTs) have recently been adapted for low-level image processing and have achieved a promising performance. However, ViTs process images in a window- or patch-based manner, compromising their computational efficiency and long-range dependency. Additionally, existing ViTs process RGB images instead of RAW data from sensors, which is sub-optimal when it comes to utilizing the rich information from RAW data. We propose a fully endto-end Conv-Transformer-based model, RawFormer, to directly utilize RAW data for low-light image enhancement. RawFormer has a structure similar to that of U-Net, but it is integrated with a thoughtfully designed Conv-Transformer Fusing (CTF) block. The CTF block combines local attention and transposed selfattention mechanisms in one module and reduces the computational overhead by adopting a transposed self-attention operation. Experiments demonstrate that RawFormer outperforms state-ofthe-art models by a significant margin on low-light RAW image enhancement tasks.

Full Text