Dual Attention Enhanced Transformer for Image Defocus Deblurring

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Image Defocus Deblurring remains a challenging problem due to the uncertainty of the blurred region and the varying depth of field. Although the convolutional neural network (CNN) has achieved promising results on this task, its limited receptive field and static weights hinder the restoration performance. In contrast, Transformer models are able to mitigate the weaknesses of CNN. However, recent Transformer-based models that deal with Image Defocus Deblurring only utilize self-attention from either spatial or channel dimension, which neglects cross-dimensional information essential for restoration. In this paper, we propose a novel Transformer model, Dual Attention Enhanced Transformer (DAEformer), for Image Defocus Deblurring. DAEformer combines self-attention from both spatial and channel dimensions, meanwhile applying auxiliary enhanced attention modules. We present Spatial Attention Enhanced Block (SAEB) and Channel Attention Enhanced Block (CAEB), which not only fuse the spatial and channel information within blocks but also enhance details. Furthermore, we design a progressive hierarchical architecture that applies SAEB/CAEB at different levels to model distinct information and facilitate fusion across blocks. Experimental results demonstrate that DAEformer can achieve state-of-the-art results on the dual-pixel dataset.

Similar Papers
  • Research Article
  • 10.3390/app15063173
Dual-Branch CNN–Mamba Method for Image Defocus Deblurring
  • Mar 14, 2025
  • Applied Sciences
  • Wenqi Zhao + 3 more

Defocus deblurring is a challenging task in the fields of computer vision and image processing. The irregularity of defocus blur kernels, coupled with the limitations of computational resources, poses significant difficulties for defocused image restoration. Additionally, the varying degrees of blur across different regions of the image impose higher demands on feature capture. Insufficient fine-grained feature extraction can result in artifacts and the loss of details, while inadequate coarse-grained feature extraction can cause image distortion and unnatural transitions. To address these challenges, we propose a defocus image deblurring method based on a hybrid CNN–Mamba architecture. This approach employs a data-driven, network-based self-learning strategy for blur processing, eliminating the need for traditional blur kernel estimation. Furthermore, by designing parallel feature extraction modules, the method leverages the local feature extraction capabilities of CNNs to capture image details, effectively restoring texture and edge information. The Mamba module models long-range dependencies, ensuring global consistency in the image. On the real defocus blur dual-pixel image dataset DPDD, the proposed CMDDNet achieves a PSNR of 28.37 in the Indoor dataset, surpassing Uformer-B (28.23) while significantly reducing the parameter count to only 9.74 M, which is 80.9% less than Uformer-B (50.88 M). Although the PSNR on the Outdoor and Combined datasets is slightly lower, CMDDNet maintains competitive performance with a significantly reduced model size, demonstrating its efficiency and effectiveness in defocus deblurring. These results indicate that CMDDNet offers a promising trade-off between performance and computational efficiency, making it well suited for lightweight applications.

  • Research Article
  • 10.1038/s41598-025-07326-6
Real-world defocus deblurring via score-based diffusion models
  • Jul 2, 2025
  • Scientific Reports
  • Yuhao Li + 9 more

Defocus blur commonly arises from the cameras’ depth-of-field limitations. While the deep learning method shows promise for image restoration problems, defocus deblurring requires accurate training data comprising pairs of all-in-focus and defocus images, which can be difficult to collect in real-world scenarios. To address this problem, we propose a high-resolution iterative deblurring method for real scenes driven by a score-based diffusion model. The method trains a score network by learning the score function of focused images at different noise levels and reconstructs high-quality images through reverse-time stochastic differential equation (SDE). A prediction-correction (PC) framework corrects discretization errors in the reverse-time SDE to enhance the robustness of images during reconstruction. The iterative nature of diffusion models enables a gradual improvement in image quality by progressively enhancing details and refining marginal distribution with each iteration. This process allows the distribution of generated images to increasingly approximate that of sharply focused images. Unlike mainstream end-to-end approaches, this method does not require paired all-in-focus and defocus images to train the model. The real-world datasets, such as self-captured datasets, were used for model training. Additional testing was conducted on the RealBlur and DED datasets to evaluate the efficacy of the proposed method. Compared to DnCNN, FFDNet and CycleGAN, superior performance was achieved by the proposed method on real-world datasets, including self-captured scenarios, with experimental results showing improvements of approximately 13.4% in PSNR and 34.7% in SSIM. These results indicate that significant enhancement in the clarity of defocus images can be attained, effectively enabling high-resolution iterative defocus deblurring in real-world scenarios through the diffusion model.

  • Research Article
  • Cite Count Icon 5
  • 10.1609/aaai.v37i3.25446
Learnable Blur Kernel for Single-Image Defocus Deblurring in the Wild
  • Jun 26, 2023
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Jucai Zhai + 4 more

Recent research showed that the dual-pixel sensor has made great progress in defocus map estimation and image defocus deblurring. However, extracting real-time dual-pixel views is troublesome and complex in algorithm deployment. Moreover, the deblurred image generated by the defocus deblurring network lacks high-frequency details, which is unsatisfactory in human perception. To overcome this issue, we propose a novel defocus deblurring method that uses the guidance of the defocus map to implement image deblurring. The proposed method consists of a learnable blur kernel to estimate the defocus map, which is an unsupervised method, and a single-image defocus deblurring generative adversarial network (DefocusGAN) for the first time. The proposed network can learn the deblurring of different regions and recover realistic details. We propose a defocus adversarial loss to guide this training process. Competitive experimental results confirm that with a learnable blur kernel, the generated defocus map can achieve results comparable to supervised methods. In the single-image defocus deblurring task, the proposed method achieves state-of-the-art results, especially significant improvements in perceptual quality, where PSNR reaches 25.56 dB and LPIPS reaches 0.111.

  • Conference Article
  • Cite Count Icon 82
  • 10.1109/cvpr52688.2022.01582
Learning to Deblur using Light Field Generated and Real Defocus Images
  • Jun 1, 2022
  • Lingyan Ruan + 3 more

Defocus deblurring is a challenging task due to the spatially varying nature of defocus blur. While deep learning approach shows great promise in solving image restoration problems, defocus deblurring demands accurate training data that consists of all-in-focus and defocus image pairs, which is difficult to collect. Naive two-shot capturing cannot achieve pixel-wise correspondence between the defocused and all-in-focus image pairs. Synthetic aperture of light fields is suggested to be a more reliable way to generate accurate image pairs. However, the defocus blur generated from light field data is different from that of the images captured with a traditional digital camera. In this paper, we propose a novel deep defocus deblurring network that leverages the strength and overcomes the shortcoming of light fields. We first train the network on a light field-generated dataset for its highly accurate image correspondence. Then, we fine-tune the network using feature loss on another dataset collected by the two-shot method to alleviate the differences between the defocus blur exists in the two domains. This strategy is proved to be highly effective and able to achieve the state-of-the-art performance both quantitatively and qualitatively on multiple test sets. Extensive ablation studies have been conducted to analyze the effect of each network module to the final performance.

  • Conference Article
  • 10.1109/cisp-bmei.2016.7852704
All-in-focus image reconstruction with depth sensing
  • Oct 1, 2016
  • Zhong-Shan Sui + 5 more

In order to reconstruct the all-in-focus image from a conventional camera, a spatially-varying defocus deblurring approach based on blur map and TV/L2 regularization was proposed. Firstly, lenticular defocus model was studied and analyzed, and the principle, characteristic and applicability of disk defocus model and Gaussian defocus model were generalized; Secondly, we modified the local contrast prior using edge properties, and combined it with the gradient of blurry edge to gain blur map, in addition, in order to decrease the noise of blur map and ambiguous edges, a guided filtering approach was also adopted to gain a better blur map; Finally, TV/L2 regularization method solved by an augmented Lagrangian method was employed to deblur the defocus image. The all-in-focus image is obtained using scale selection and image reconstruction. The experimental results based on both synthesized images and real images showed that the proposed approach can gain excellent all-in-focus images and the performance outperforms the state-of-the-art space-invariant methods and spatially-varying methods for defocus deblurring, and the all-in-focus images show a better visual effect.

  • Research Article
  • Cite Count Icon 2
  • 10.1609/aaai.v39i7.32819
Intra and Inter Parser-Prompted Transformers for Effective Image Restoration
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Cong Wang + 3 more

We propose Intra and Inter Parser-Prompted Transformers (PPTformer) that explore useful features from visual foundation models for image restoration. Specifically, PPTformer contains two parts: an Image Restoration Network (IRNet) for restoring images from degraded observations and a Parser-Prompted Feature Generation Network (PPFGNet) for providing IRNet with reliable parser information to boost restoration. To enhance the integration of the parser within IRNet, we propose Intra Parser-Prompted Attention (IntraPPA) and Inter Parser-Prompted Attention (InterPPA) to implicitly and explicitly learn useful parser features to facilitate restoration. The IntraPPA re-considers cross attention between parser and restoration features, enabling implicit perception of the parser from a long-range and intra-layer perspective. Conversely, the InterPPA initially fuses restoration features with those of the parser, followed by formulating these fused features within an attention mechanism to explicitly perceive parser information. Further, we propose a parser-prompted feed-forward network to guide restoration within pixel-wise gating modulation. Experimental results show that PPTformer achieves state-of-the-art performance on image deraining, defocus deblurring, desnowing, and low-light enhancement.

  • Conference Article
  • Cite Count Icon 1905
  • 10.1109/cvpr52688.2022.01716
Uformer: A General U-Shaped Transformer for Image Restoration
  • Jun 1, 2022
  • Zhendong Wang + 5 more

In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. In Uformer, there are two core designs. First, we introduce a novel locally-enhanced window (LeWin) Transformer block, which performs non-overlapping window-based self-attention instead of global self-attention. It significantly reduces the computational complexity on high resolution feature map while capturing local context. Second, we propose a learnable multi-scale restoration modulator in the form of a multi-scale spatial bias to adjust features in multiple layers of the Uformer decoder. Our modulator demonstrates superior capability for restoring details for various image restoration tasks while introducing marginal extra parameters and computational cost. Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration. To evaluate our approach, extensive experiments are conducted on several image restoration tasks, including image denoising, motion deblurring, defocus deblurring and deraining. Without bells and whistles, our Uformer achieves superior or comparable performance compared with the state-of-the-art algorithms. The code and models are available at https://github.com/ZhendongWang6/Uformer.

  • Research Article
  • Cite Count Icon 12
  • 10.1609/aaai.v37i2.25235
Learning Single Image Defocus Deblurring with Misaligned Training Pairs
  • Jun 26, 2023
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Yu Li + 3 more

By adopting popular pixel-wise loss, existing methods for defocus deblurring heavily rely on well aligned training image pairs. Although training pairs of ground-truth and blurry images are carefully collected, e.g., DPDD dataset, misalignment is inevitable between training pairs, making existing methods possibly suffer from deformation artifacts. In this paper, we propose a joint deblurring and reblurring learning (JDRL) framework for single image defocus deblurring with misaligned training pairs. Generally, JDRL consists of a deblurring module and a spatially invariant reblurring module, by which deblurred result can be adaptively supervised by ground-truth image to recover sharp textures while maintaining spatial consistency with the blurry image. First, in the deblurring module, a bi-directional optical flow-based deformation is introduced to tolerate spatial misalignment between deblurred and ground-truth images. Second, in the reblurring module, deblurred result is reblurred to be spatially aligned with blurry image, by predicting a set of isotropic blur kernels and weighting maps. Moreover, we establish a new single image defocus deblurring (SDD) dataset, further validating our JDRL and also benefiting future research. Our JDRL can be applied to boost defocus deblurring networks in terms of both quantitative metrics and visual quality on DPDD, RealDOF and our SDD datasets.

  • Book Chapter
  • 10.3233/faia241382
Single Image Defocus Deblurring with Progressive Feature Attention Network
  • Dec 13, 2024
  • Frontiers in artificial intelligence and applications
  • Hongmin Zhan + 2 more

The goal of single-image defocus deblurring is to reconstruct a clear image from a defocused one. Although existing methods perform well in common blurry scenes, they still face the challenge of feature extraction when meeting severely defocused areas. Therefore, the innovative progressive multi-scale feature extraction module (PMFEM) and the feature attention module (FAM) are proposed. The PMFEM gradually extracts and fuses multi-scale features through convolution kernels of different sizes. Three parallel paths process the input respectively, simplifying and optimizing the feature extraction process layer by layer to achieve progressive processing. The FAM dynamically enhances or suppresses features of specific scales by fusing multi-path feature information, thereby improving the model’s feature expression ability. Experimental results show that our method achieves state-of-the-art performance on the Dual-Pixel Defocus Deblurring (DPDD) and Regression Tree Fields (RTF) datasets. Our code can be obtained at https://github.com/HMin-Z/PFANet.

  • Research Article
  • Cite Count Icon 5
  • 10.1007/s40747-025-01789-w
Swin-Diff: a single defocus image deblurring network based on diffusion model
  • Feb 17, 2025
  • Complex & Intelligent Systems
  • Hanyan Liang + 3 more

Single Image Defocus Deblurring (SIDD) remains challenging due to spatially varying blur kernels, particularly in processing high-resolution images where traditional methods often struggle with artifact generation, detail preservation, and computational efficiency. This paper presents Swin-Diff, a novel architecture integrating diffusion models with Transformer-based networks for robust defocus deblurring. Our approach employs a two-stage training strategy where a diffusion model generates prior information in a compact latent space, which is then hierarchically fused with intermediate features to guide the regression model. The architecture incorporates a dual-dimensional self-attention mechanism operating across channel and spatial domains, enhancing long-range modeling capabilities while maintaining linear computational complexity. Extensive experiments on three public datasets (DPDD, RealDOF, and RTF) demonstrate Swin-Diff’s superior performance, achieving average improvements of 1.37% in PSNR, 3.6% in SSIM, 2.3% in MAE, and 25.2% in LPIPS metrics compared to state-of-the-art methods. Our results validate the effectiveness of combining diffusion models with hierarchical attention mechanisms for high-quality defocus blur removal.

  • Research Article
  • Cite Count Icon 12
  • 10.1109/tpami.2024.3457856
Deep Single Image Defocus Deblurring via Gaussian Kernel Mixture Learning.
  • Dec 1, 2024
  • IEEE transactions on pattern analysis and machine intelligence
  • Yuhui Quan + 3 more

This paper proposes an end-to-end deep learning approach for removing defocus blur from a single defocused image. Defocus blur is a common issue in digital photography that poses a challenge due to its spatially-varying and large blurring effect. The proposed approach addresses this challenge by employing a pixel-wise Gaussian kernel mixture (GKM) model to accurately yet compactly parameterize spatially-varying defocus point spread functions (PSFs), which is motivated by the isotropy in defocus PSFs. We further propose a grouped GKM (GGKM) model that decouples the coefficients in GKM, so as to improve the modeling accuracy with an economic manner. Afterward, a deep neural network called GGKMNet is then developed by unrolling a fixed-point iteration process of GGKM-based image deblurring, which avoids the efficiency issues in existing unrolling DNNs. Using a lightweight scale-recurrent architecture with a coarse-to-fine estimation scheme to predict the coefficients in GGKM, the GGKMNet can efficiently recover an all-in-focus image from a defocused one. Such advantages are demonstrated with extensive experiments on five benchmark datasets, where the GGKMNet outperforms existing defocus deblurring methods in restoration quality, as well as showing advantages in terms of model complexity and computational efficiency.

  • Research Article
  • Cite Count Icon 5
  • 10.1609/aaai.v38i6.28320
Prior and Prediction Inverse Kernel Transformer for Single Image Defocus Deblurring
  • Mar 24, 2024
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Peng Tang + 6 more

Defocus blur, due to spatially-varying sizes and shapes, is hard to remove. Existing methods either are unable to effectively handle irregular defocus blur or fail to generalize well on other datasets. In this work, we propose a divide-and-conquer approach to tackling this issue, which gives rise to a novel end-to-end deep learning method, called prior-and-prediction inverse kernel transformer (P2IKT), for single image defocus deblurring. Since most defocus blur can be approximated as Gaussian blur or its variants, we construct an inverse Gaussian kernel module in our method to enhance its generalization ability. At the same time, an inverse kernel prediction module is introduced in order to flexibly address the irregular blur that cannot be approximated by Gaussian blur. We further design a scale recurrent transformer, which estimates mixing coefficients for adaptively combining the results from the two modules and runs the scale recurrent ``coarse-to-fine" procedure for progressive defocus deblurring. Extensive experimental results demonstrate that our P2IKT outperforms previous methods in terms of PSNR on multiple defocus deblurring datasets.

  • Conference Article
  • Cite Count Icon 155
  • 10.1109/cvpr46437.2021.00207
Iterative Filter Adaptive Network for Single Image Defocus Deblurring
  • Jun 1, 2021
  • Junyong Lee + 4 more

We propose a novel end-to-end learning-based approach for single image defocus deblurring. The proposed approach is equipped with a novel Iterative Filter Adaptive Network (IFAN) that is specifically designed to handle spatially-varying and large defocus blur. For adaptively handling spatially-varying blur, IFAN predicts pixel-wise deblurring filters, which are applied to defocused features of an input image to generate deblurred features. For effectively managing large blur, IFAN models deblurring filters as stacks of small-sized separable filters. Predicted separable deblurring filters are applied to defocused features using a novel Iterative Adaptive Convolution (IAC) layer. We also propose a training scheme based on defocus disparity estimation and reblurring, which significantly boosts the deblurring quality. We demonstrate that our method achieves state-of-the-art performance both quantitatively and qualitatively on real-world images.

  • Conference Article
  • Cite Count Icon 32
  • 10.1109/cvpr52729.2023.00557
Neumann Network with Recursive Kernels for Single Image Defocus Deblurring
  • Jun 1, 2023
  • Yuhui Quan + 2 more

Single image defocus deblurring (SIDD) refers to recovering an all-in-focus image from a defocused blurry one. It is a challenging recovery task due to the spatially-varying defocus blurring effects with significant size variation. Motivated by the strong correlation among defocus kernels of different sizes and the blob-type structure of defocus kernels, we propose a learnable recursive kernel representation (RKR) for defocus kernels that expresses a defocus kernel by a linear combination of recursive, separable and positive atom kernels, leading to a compact yet effective and physics-encoded parametrization of the spatially-varying defocus blurring process. Afterwards, a physics-driven and efficient deep model with a cross-scale fusion structure is presented for SIDD, with inspirations from the truncated Neumann series for approximating the matrix inversion of the RKR-based blurring operator. In addition, a reblurring loss is proposed to regularize the RKR learning. Extensive experiments show that, our proposed approach significantly outperforms existing ones, with a model size comparable to that of the top methods.

  • Research Article
  • 10.1007/s11263-025-02522-3
Reblurring-Guided Single Image Defocus Deblurring: A Learning Framework with Misaligned Training Pairs
  • Jul 5, 2025
  • International Journal of Computer Vision
  • Dongwei Ren + 5 more

Reblurring-Guided Single Image Defocus Deblurring: A Learning Framework with Misaligned Training Pairs

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant