Abstract

Recently, Transformer-based architecture has been introduced into single image deraining tasks due to its advantage in modeling non-local information. However, existing approaches typically utilize self-attention along single spatial or channel dimension. They neglect feature fusion in different dimensions to explore contextual information fully, which limits the effective receptive field of the network and makes it challenging to learn image degradation relationships. To fully explore potential correlations between different dimensions of degraded images, we develop a Dual-branch Collaborative Transformer, called DCformer. To be specific, we employ parallel multi-head self-attention (PMSA) as the core block to extract long-range contextual relationships across spatial and channel dimensions. Additionally, a local perception block (LPB) is introduced to provide the ability of local information acquisition in the network, which is complementary to the global modeling ability of the parallel hybrid self-attention mechanism. Finally, we design a feature interaction block (FIB) to further enhance the interaction of features at different resolutions. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call