Abstract

Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually applied and achieved great success in image denoising, image compression, image enhancement, etc. Recently, Transformer has been a hot technique, which is widely used to tackle computer vision tasks. However, few Transformer-based methods have been proposed for low-level vision tasks. In this paper, we proposed an image denoising network structure based on Transformer, which is named DenSformer. DenSformer consists of three modules, including a preprocessing module, a local-global feature extraction module, and a reconstruction module. Specifically, the local-global feature extraction module consists of several Sformer groups, each of which has several ETransformer layers and a convolution layer, together with a residual connection. These Sformer groups are densely skip-connected to fuse the feature of different layers, and they jointly capture the local and global information from the given noisy images. We conduct our model on comprehensive experiments. In synthetic noise removal, DenSformer outperforms other state-of-the-art methods by up to 0.06–0.28 dB in gray-scale images and 0.57–1.19 dB in color images. In real noise removal, DenSformer can achieve comparable performance, while the number of parameters can be reduced by up to 40%. Experimental results prove that our DenSformer achieves improvement compared to some state-of-the-art methods, both for the synthetic noise data and real noise data, in the objective and subjective evaluations.

Highlights

  • The acquisition of images and videos is basically dependent on some digital devices; the collection process is often affected by various degradation factors

  • We can observe that the results of the Vanilla-C model are a little lower than those of the Vanilla model, and the results of the Enhanced LeWin model are better than those of the other three models, which means that the convolution layer in the generation of Q, K, and V, and the LeFF layer in the Feed-Forward Network can help Transformer better extract and utilize information, which can perform well in different denoising tasks

  • We proposed a new dense residual skip-connection network based on Transformer (DenSformer) for effective image denoising

Read more

Summary

Introduction

The acquisition of images and videos is basically dependent on some digital devices; the collection process is often affected by various degradation factors. We propose an end-to-end Transformer-based network for image denoising, where both Transformer and convolutional layers are utilized to implement the fusion between the local and global features. With the development of deep learning, many researchers attempt to design novel denoising models based on convolution neural network (CNN), and most of them have achieved impressive improvement on the performance. More works are focused on the design of CNN-based network architectures [13], including the improvement of receptive field and the balance of performance and model size These models can learn the noise distribution well from the training data, but they are not well suitable for the real noise removal. These models attempt to increase the performance of real denoising, it is still necessary to balance the trade-off between model size and performance

Vision Transformer
Proposed Work
Network Architecture
Sfomer Block
Dense Residual Skip Connection
Experimental Settings
Synthetic Noisy Images
Method
Real Noisy Images
Ablation Study
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.