Abstract

Although convolutional neural network and visual transformer have been successfully applied in various field of computer vision, there is little work combining them to construct an efficient network model to solve image deraining tasks. Convolutional neural network is utilized to extract the features from each region, while visual transformer is utilized to extract the context information between local features. Due to the limitations in computational resources and processing time, it is difficult for visual transformer to process high-resolution images, which hinders the application of visual transformer in devices with limited hardware resources. The purpose of this article is to utilize the advantages of to design a lightweight encoder-decoder network for real-time image deraining. Firstly, a novel channel Transformer module is designed to obtain global contextual information, where deep separable convolution is utilized to extract multi-scale local features and a Transformer encoder is constructed by stacking Transformer modules. Secondly, a decoder based on a fully convolution is designed to adopt mask attention and inverted bottleneck convolution to achieve progressive feature fusion and feature reconstruction, which significantly reduces computational complexity and memory requirement. A large number of experimental results have verified that the proposed method has superior performance compared with other state-of-the art methods, while the computational cost and parameter quantity are much smaller than those of similar methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call