In the past few years, convolutional neural networks (CNNs) have become the primary workhorse for image restoration tasks. However, the deficiency of modeling long-range dependencies due to the local computational property of convolution greatly limits the restoration performance. To overcome this limitation, we propose a novel multi-stage progressive convolutional Transformer to recursively restore the degraded images, termed PCformer, which enjoys a high capability for capturing local context and global dependencies with friendly computational cost. Specifically, each stage of PCformer is an asymmetric encoder–decoder network whose bottleneck is built upon a tailored Transformer block with convolution operation deployed to avoid any loss of local context. Both encoder and decoder are convolution-based modules, thus allowing to explore rich contextualized information for image recovery. Taking the low-resolution features encoded by the encoder as tokens input into the Transformer bottleneck guarantees that long-range pixel interactions are captured while reducing the computational burden. Meanwhile, we apply a gated module for filtering redundant information propagation between every two phases. In addition, long-range enhanced inpainting is further introduced to mining the ability of PCformer to exploit distant complementary features. Extensive experiments yield superior results and in particular establishing new state-of-the-art results on several image restoration tasks, including deraining (+0.37 dB on Rain13K), denoising (+0.11 dB on DND), dehazing (+0.56 dB on I-HAZE), enhancement (+0.72 dB on SICE), and shadow removal (+0.65 RMSE on ISTD). The implementation code is available at https://github.com/Jeasco/PCformer.
Read full abstract