Abstract
Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually applied and achieved great success in image denoising, image compression, image enhancement, etc. Recently, Transformer has been a hot technique, which is widely used to tackle computer vision tasks. However, few Transformer-based methods have been proposed for low-level vision tasks. In this paper, we proposed an image denoising network structure based on Transformer, which is named DenSformer. DenSformer consists of three modules, including a preprocessing module, a local-global feature extraction module, and a reconstruction module. Specifically, the local-global feature extraction module consists of several Sformer groups, each of which has several ETransformer layers and a convolution layer, together with a residual connection. These Sformer groups are densely skip-connected to fuse the feature of different layers, and they jointly capture the local and global information from the given noisy images. We conduct our model on comprehensive experiments. In synthetic noise removal, DenSformer outperforms other state-of-the-art methods by up to 0.06–0.28 dB in gray-scale images and 0.57–1.19 dB in color images. In real noise removal, DenSformer can achieve comparable performance, while the number of parameters can be reduced by up to 40%. Experimental results prove that our DenSformer achieves improvement compared to some state-of-the-art methods, both for the synthetic noise data and real noise data, in the objective and subjective evaluations.
Highlights
The acquisition of images and videos is basically dependent on some digital devices; the collection process is often affected by various degradation factors
We can observe that the results of the Vanilla-C model are a little lower than those of the Vanilla model, and the results of the Enhanced LeWin model are better than those of the other three models, which means that the convolution layer in the generation of Q, K, and V, and the LeFF layer in the Feed-Forward Network can help Transformer better extract and utilize information, which can perform well in different denoising tasks
We proposed a new dense residual skip-connection network based on Transformer (DenSformer) for effective image denoising
Summary
The acquisition of images and videos is basically dependent on some digital devices; the collection process is often affected by various degradation factors. We propose an end-to-end Transformer-based network for image denoising, where both Transformer and convolutional layers are utilized to implement the fusion between the local and global features. With the development of deep learning, many researchers attempt to design novel denoising models based on convolution neural network (CNN), and most of them have achieved impressive improvement on the performance. More works are focused on the design of CNN-based network architectures [13], including the improvement of receptive field and the balance of performance and model size These models can learn the noise distribution well from the training data, but they are not well suitable for the real noise removal. These models attempt to increase the performance of real denoising, it is still necessary to balance the trade-off between model size and performance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.