Abstract

Low-light image enhancement plays a central role in various downstream computer vision tasks. Vision Transformers (ViTs) have recently been adapted for low-level image processing and have achieved a promising performance. However, ViTs process images in a window- or patch-based manner, compromising their computational efficiency and long-range dependency. Additionally, existing ViTs process RGB images instead of RAW data from sensors, which is sub-optimal when it comes to utilizing the rich information from RAW data. We propose a fully endto-end Conv-Transformer-based model, RawFormer, to directly utilize RAW data for low-light image enhancement. RawFormer has a structure similar to that of U-Net, but it is integrated with a thoughtfully designed Conv-Transformer Fusing (CTF) block. The CTF block combines local attention and transposed selfattention mechanisms in one module and reduces the computational overhead by adopting a transposed self-attention operation. Experiments demonstrate that RawFormer outperforms state-ofthe-art models by a significant margin on low-light RAW image enhancement tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call