Clouds and Haze Co-Removal Based on Weight-Tuned Overlap Refinement Diffusion Model for Remote Sensing Images

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Remote sensing image dehazing is essential for preprocessing, but most methods overlook the joint occlusion by clouds and haze. Furthermore, generative model-based restoration struggles with haze, weakening edge recovery for targets obscured by both elements. In this paper, we propose a new diffusion model based on weight-tuned overlap refinement for clouds and haze co-removal in remote sensing images. Firstly, we integrate the diffusion model into clouds and haze co-removal task, offering a lightweight solution effective even under limited samples. Secondly, we propose a weight-tuned overlap refinement method (WTOR) to guide noise estimation updating in overlaps in the backward-sampling process. It takes the reciprocal of the variance of each overlap as the weight factor, adjusts the backward-sampling update frequency to improve the model’s ability of clouds and haze co-removal. Finally, we introduce the gaussian filter feature extraction block (GFAB) to enhance the learning ability of the model for structural information, which can detect local changes and extract features with precision. The experimental results demonstrate that the proposed method is capable of co-removing clouds and haze with reserving rich color and texture details.

Similar Papers
  • Research Article
  • Cite Count Icon 9
  • 10.3390/rs17081348
Taming a Diffusion Model to Revitalize Remote Sensing Image Super-Resolution
  • Apr 10, 2025
  • Remote Sensing
  • Chao Zhu + 3 more

Conventional neural network-based approaches for single remote sensing image super-resolution (SRSISR) have made remarkable progress. However, the super-resolution outputs produced by these methods often fall short in terms of visual quality. Recent advances in diffusion models for image generation have demonstrated remarkable potential for enhancing the visual content of super-resolved images. Despite this promise, existing large diffusion models are predominantly trained on natural images, which have huge differences in data distribution, making them hard to apply in remote sensing images (RSIs). This disparity poses challenges for directly applying these models to RSIs. Moreover, while diffusion models possess powerful generative capabilities, their output must be carefully controlled to generate accurate details as the objects in RSIs are small and blurry. In this paper, we introduce RSDiffSR, a novel SRSISR method based on a conditional diffusion model. This framework ensures the high-quality super-resolution of RSIs through three key contributions. First, it leverages a large diffusion model as a generative prior, which substantially enhances the visual quality of super-resolved RSIs. Second, it incorporates low-rank adaptation into the diffusion UNet and multi-stage training process to address the domain gap caused by differences in data distributions. Third, an enhanced control mechanism is designed to process the content and edge information of RSIs, providing effective guidance during the diffusion process. Experimental results demonstrate that the proposed RSDiffSR achieves state-of-the-art performance in both quantitative and qualitative evaluations across multiple benchmarks.

  • Research Article
  • Cite Count Icon 67
  • 10.3390/rs14194834
Diffusion Model with Detail Complement for Super-Resolution of Remote Sensing
  • Sep 28, 2022
  • Remote Sensing
  • Jinzhe Liu + 5 more

Remote sensing super-resolution (RSSR) aims to improve remote sensing (RS) image resolution while providing finer spatial details, which is of great significance for high-quality RS image interpretation. The traditional RSSR is based on the optimization method, which pays insufficient attention to small targets and lacks the ability of model understanding and detail supplement. To alleviate the above problems, we propose the generative Diffusion Model with Detail Complement (DMDC) for RS super-resolution. Firstly, unlike traditional optimization models with insufficient image understanding, we introduce the diffusion model as a generation model into RSSR tasks and regard low-resolution images as condition information to guide image generation. Next, considering that generative models may not be able to accurately recover specific small objects and complex scenes, we propose the detail supplement task to improve the recovery ability of DMDC. Finally, the strong diversity of the diffusion model makes it possibly inappropriate in RSSR, for this purpose, we come up with joint pixel constraint loss and denoise loss to optimize the direction of inverse diffusion. The extensive qualitative and quantitative experiments demonstrate the superiority of our method in RSSR with small and dense targets. Moreover, the results from direct transfer to different datasets also prove the superior generalization ability of DMDC.

  • Research Article
  • Cite Count Icon 10
  • 10.3390/rs16214083
Remote Sensing Image Change Captioning Using Multi-Attentive Network with Diffusion Model
  • Nov 1, 2024
  • Remote Sensing
  • Yue Yang + 5 more

Remote sensing image change captioning (RSICC) has received considerable research interest due to its ability of automatically providing meaningful sentences describing the changes in remote sensing (RS) images. Existing RSICC methods mainly utilize pre-trained networks on natural image datasets to extract feature representations. This degrades performance since aerial images possess distinctive characteristics compared to natural images. In addition, it is challenging to capture the data distribution and perceive contextual information between samples, resulting in limited robustness and generalization of the feature representations. Furthermore, their focus on inherent most change-aware discriminative information is insufficient by directly aggregating all features. To deal with these problems, a novel framework entitled Multi-Attentive network with Diffusion model for RSICC (MADiffCC) is proposed in this work. Specifically, we introduce a diffusion feature extractor based on RS image dataset pre-trained diffusion model to capture the multi-level and multi-time-step feature representations of bitemporal RS images. The diffusion model is able to learn the training data distribution and contextual information of RS objects from which more robust and generalized representations could be extracted for the downstream application of change captioning. Furthermore, a time-channel-spatial attention (TCSA) mechanism based difference encoder is designed to utilize the extracted diffusion features to obtain the discriminative information. A gated multi-head cross-attention (GMCA)-guided change captioning decoder is then proposed to select and fuse crucial hierarchical features for more precise change description generation. Experimental results on the publicly available LEVIR-CC, LEVIRCCD, and DUBAI-CC datasets verify that the developed approach could realize state-of-the-art (SOTA) performance.

  • Research Article
  • Cite Count Icon 14
  • 10.3390/rs16071219
Denoising Diffusion Probabilistic Model with Adversarial Learning for Remote Sensing Super-Resolution
  • Mar 30, 2024
  • Remote Sensing
  • Jialu Sui + 2 more

Single Image Super-Resolution (SISR) for image enhancement enables the generation of high spatial resolution in Remote Sensing (RS) images without incurring additional costs. This approach offers a practical solution to obtain high-resolution RS images, addressing challenges posed by the expense of acquisition equipment and unpredictable weather conditions. To address the over-smoothing of the previous SISR models, the diffusion model has been incorporated into RS SISR to generate Super-Resolution (SR) images with enhanced textural details. In this paper, we propose a Diffusion model with Adversarial Learning Strategy (DiffALS) to refine the generative capability of the diffusion model. DiffALS integrates an additional Noise Discriminator (ND) into the training process, employing an adversarial learning strategy on the data distribution learning. This ND guides noise prediction by considering the general correspondence between the noisy image in each step, thereby enhancing the diversity of generated data and the detailed texture prediction of the diffusion model. Furthermore, considering that the diffusion model may exhibit suboptimal performance on traditional pixel-level metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM), we showcase the effectiveness of DiffALS through downstream semantic segmentation applications. Extensive experiments demonstrate that the proposed model achieves remarkable accuracy and notable visual enhancements. Compared to other state-of-the-art methods, our model establishes an improvement of 189 for Fréchet Inception Distance (FID) and 0.002 for Learned Perceptual Image Patch Similarity (LPIPS) in a SR dataset, namely Alsat, and achieves improvements of 0.4%, 0.3%, and 0.2% for F1 score, MIoU, and Accuracy, respectively, in a segmentation dataset, namely Vaihingen.

  • Research Article
  • 10.3390/rs17071144
CGD-CD: A Contrastive Learning-Guided Graph Diffusion Model for Change Detection in Remote Sensing Images
  • Mar 24, 2025
  • Remote Sensing
  • Yang Shang + 4 more

With the rapid development of remote sensing technology, the question of how to leverage large amounts of unlabeled remote sensing data to detect changes in multi-temporal images has become a significant challenge. Self-supervised methods (SSL) for remote sensing image change detection (CD) can effectively address the issue of limited labeled data. However, most SSL algorithms for CD in remote sensing image rely on convolutional neural networks with fixed receptive fields as their feature extraction backbones, which limits their ability to capture objects of varying scales and model global contextual information in complex scenes. Additionally, these methods fail to capture essential topological and structural information from remote sensing images, resulting in a high false positive rate. To address these issues, we introduce a graph diffusion model into the field of CD and propose a novel network architecture called CGD-CD Net, which is driven by a structure-sensitive SSL strategy based on contrastive learning. Specifically, a superpixel segmentation algorithm is applied to bi-temporal images to construct graph nodes, while the k-nearest neighbors algorithm is used to define edge connections. Subsequently, a diffusion model is employed to balance the states of nodes within the graph, enabling the co-evolution of adjacency relationships and feature information, thereby aggregating higher-order feature information to obtain superior feature embeddings. The network is trained with a carefully crafted contrastive loss function to effectively capture high-level structural information. Ultimately, high-quality difference images are generated from the extracted bi-temporal features, then use thresholding analysis to obtain a final change map. The effectiveness and feasibility of the suggested method are confirmed by experimental results on three different datasets, which show that it performs better than several of the top SSL-CD methods.

  • Research Article
  • Cite Count Icon 5
  • 10.3390/rs16193716
MapGen-Diff: An End-to-End Remote Sensing Image to Map Generator via Denoising Diffusion Bridge Model
  • Oct 6, 2024
  • Remote Sensing
  • Jilong Tian + 3 more

Online maps are of great importance in modern life, especially in commuting, traveling and urban planning. The accessibility of remote sensing (RS) images has contributed to the widespread practice of generating online maps based on RS images. The previous works leverage an idea of domain mapping to achieve end-to-end remote sensing image-to-map translation (RSMT). Although existing methods are effective and efficient for online map generation, generated online maps still suffer from ground features distortion and boundary inaccuracy to a certain extent. Recently, the emergence of diffusion models has signaled a significant advance in high-fidelity image synthesis. Based on rigorous mathematical theories, denoising diffusion models can offer controllable generation in sampling process, which are very suitable for end-to-end RSMT. Therefore, we design a novel end-to-end diffusion model to generate online maps directly from remote sensing images, called MapGen-Diff. We leverage a strategy inspired by Brownian motion to make a trade-off between the diversity and the accuracy of generation process. Meanwhile, an image compression module is proposed to map the raw images into the latent space for capturing more perception features. In order to enhance the geometric accuracy of ground features, a consistency regularization is designed, which allows the model to generate maps with clearer boundaries and colorization. Compared to several state-of-the-art methods, the proposed MapGen-Diff achieves outstanding performance, especially a 5% RMSE and 7% SSIM improvement on Los Angeles and Toronto datasets. The visualization results also demonstrate more accurate local details and higher quality.

  • Research Article
  • Cite Count Icon 42
  • 10.3390/rs15133452
Enhancing Remote Sensing Image Super-Resolution with Efficient Hybrid Conditional Diffusion Model
  • Jul 7, 2023
  • Remote Sensing
  • Lintao Han + 6 more

Recently, optical remote-sensing images have been widely applied in fields such as environmental monitoring and land cover classification. However, due to limitations in imaging equipment and other factors, low-resolution images that are unfavorable for image analysis are often obtained. Although existing image super-resolution algorithms can enhance image resolution, these algorithms are not specifically designed for the characteristics of remote-sensing images and cannot effectively recover high-resolution images. Therefore, this paper proposes a novel remote-sensing image super-resolution algorithm based on an efficient hybrid conditional diffusion model (EHC-DMSR). The algorithm applies the theory of diffusion models to remote-sensing image super-resolution. Firstly, the comprehensive features of low-resolution images are extracted through a transformer network and CNN to serve as conditions for guiding image generation. Furthermore, to constrain the diffusion model and generate more high-frequency information, a Fourier high-frequency spatial constraint is proposed to emphasize high-frequency spatial loss and optimize the reverse diffusion direction. To address the time-consuming issue of the diffusion model during the reverse diffusion process, a feature-distillation-based method is proposed to reduce the computational load of U-Net, thereby shortening the inference time without affecting the super-resolution performance. Extensive experiments on multiple test datasets demonstrated that our proposed algorithm not only achieves excellent results in quantitative evaluation metrics but also generates sharper super-resolved images with rich detailed information.

  • Research Article
  • Cite Count Icon 41
  • 10.1007/s00521-024-10363-3
RSDiff: remote sensing image generation from text using diffusion model
  • Oct 5, 2024
  • Neural Computing and Applications
  • Ahmad Sebaq + 1 more

The generation and enhancement of satellite imagery are critical in remote sensing, requiring high-quality, detailed images for accurate analysis. This research introduces a two-stage diffusion model methodology for synthesizing high-resolution satellite images from textual prompts. The pipeline comprises a low-resolution diffusion model (LRDM) that generates initial images based on text inputs and a super-resolution diffusion model (SRDM) that refines these images into high-resolution outputs. The LRDM merges text and image embeddings within a shared latent space, capturing essential scene content and structure. The SRDM then enhances these images, focusing on spatial features and visual clarity. Experiments conducted using the Remote Sensing Image Captioning Dataset demonstrate that our method outperforms existing models, producing satellite images with accurate geographical details and improved spatial resolution.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/rs17132241
SAR-DeCR: Latent Diffusion for SAR-Fused Thick Cloud Removal
  • Jun 30, 2025
  • Remote Sensing
  • Meilin Wang + 3 more

The current methods for removing thick clouds from remote-sensing images face significant limitations, including the integration of thick cloud images with synthetic aperture radar (SAR) ground information, the provision of meaningful guidance for SAR ground data, and the accurate reconstruction of textures in cloud-covered regions. To overcome these challenges, we introduce SAR-DeCR, a novel method for thick cloud removal in satellite remote-sensing images. SAR-DeCR utilizes a diffusion model combined with the transformer architecture to synthesize accurate texture details guided by SAR ground information. The method is structured into three distinct phases: coarse cloud removal (CCR), SAR-Fusion (SAR-F) and cloud-free diffusion (CF-D), aimed at enhancing the effectiveness of the thick cloud removal. In CCR, we significantly employ the transformer’s capability for long-range information interaction, which significantly strengthens the cloud removal process. In order to overcome the problem of missing ground information after cloud removal and ensure that the ground information produced is consistent with SAR data, we introduced SAR-F, a module designed to incorporate the rich ground information in synthetic aperture radar (SAR) into the output of CCR. Additionally, to achieve superior texture reconstruction, we introduce prior supervision based on the output of the coarse cloud removal, using a pre-trained visual-text diffusion model named cloud-free diffusion (CF-D). This diffusion model is encouraged to follow the visual prompts, thus producing a visually appealing, high-quality result. The effectiveness and superiority of SAR-DeCR are demonstrated through qualitative and quantitative experiments, comparing it with other state-of-the-art (SOTA) thick cloud removal methods on the large-scale SEN12MS-CR dataset.

  • Research Article
  • 10.1088/1742-6596/3055/1/012012
Typhoon Forecasting Based on Remote Sensing Satellite Data and Two-Stage Super-Resolution Diffusion (TSRD) Model
  • Jul 1, 2025
  • Journal of Physics: Conference Series
  • Yicheng Zhang + 3 more

In recent years, the frequency and intensity of extreme weather events worldwide have increased significantly, leading to profound impacts on human society, the economy, and ecological environments. The integration of remote sensing satellite technology with Numerical Weather Prediction (NWP) has notably enhanced monitoring and forecasting capabilities for extreme weather events, such as typhoons. However, the resolution of the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 data, which is limited to 0.25°0.25°, restricts data precision, leading to deviations when using NWP to simulate mesoscale weather systems, such as changes in typhoon eyewalls. Furthermore, despite ongoing advancements in meteorological forecasting, NWP predictions tend to be blurry, often underestimating the severity of extreme weather events. To address these challenges, we propose a novel Two-Stage Super-Resolution Diffusion (TSRD) model to augment high-resolution data for more accurate extreme weather forecasting. By integrating the diffusion model with an anomaly detection model that captures finer atmospheric features of extreme weather, the diffusion model is fine-tuned via backpropagation of loss from the anomaly detection model in an online fashion. The TSRD model synthesizes rare high-resolution extreme weather data from low-resolution inputs, and improve the prediction performance of extreme weather events at the later stage. This paper analyzes 17 significant typhoon landfall events along the coast of China from 2022 to 2024, comparing predicted paths and intensities. Evaluation results show that the TSRD model’s path accuracy has improved by 16.4%, and the typhoon intensity prediction accuracy has increased by 11.1%, significantly outperforming the NWP model. These results further validate the advantages of TSRD model.

  • Research Article
  • Cite Count Icon 42
  • 10.3390/rs15092346
TESR: Two-Stage Approach for Enhancement and Super-Resolution of Remote Sensing Images
  • Apr 29, 2023
  • Remote Sensing
  • Anas M Ali + 4 more

Remote Sensing (RS) images are usually captured at resolutions lower than those required. Deep Learning (DL)-based super-resolution (SR) architectures are typically used to increase the resolution artificially. In this study, we designed a new architecture called TESR (Two-stage approach for Enhancement and super-resolution), leveraging the power of Vision Transformers (ViT) and the Diffusion Model (DM) to increase the resolution of RS images artificially. The first stage is the ViT-based model, which serves to increase resolution. The second stage is an iterative DM pre-trained on a larger dataset, which serves to increase image quality. Every stage is trained separately on the given task using a separate dataset. The self-attention mechanism of the ViT helps the first stage generate global and contextual details. The iterative Diffusion Model helps the second stage enhance the image’s quality and generate consistent and harmonic fine details. We found that TESR outperforms state-of-the-art architectures on super-resolution of remote sensing images on the UCMerced benchmark dataset. Considering the PSNR/SSIM metrics, TESR improves SR image quality as compared to state-of-the-art techniques from 34.03/0.9301 to 35.367/0.9449 in the scale ×2. On a scale of ×3, it improves from 29.92/0.8408 to 32.311/0.91143. On a scale of ×4, it improves from 27.77/0.7630 to 31.951/0.90456. We also found that the Charbonnier loss outperformed other loss functions in the training of both stages of TESR. The improvement was by a margin of 21.5%/14.3%, in the PSNR/SSIM, respectively. The source code of TESR is open to the community.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/s25051394
Transferable Contextual Network for Rural Road Extraction from UAV-Based Remote Sensing Images.
  • Feb 25, 2025
  • Sensors (Basel, Switzerland)
  • Jian Wang + 4 more

Road extraction from UAV-based remote sensing images in rural areas presents significant challenges due to the diverse and complex characteristics of rural roads. Additionally, acquiring UAV remote sensing data for rural areas is challenging due to the high cost of equipment, the lack of clear road boundaries requiring extensive manual annotation, and limited regional policy support for UAV operations. To address these challenges, we propose a transferable contextual network (TCNet), designed to enhance the transferability and accuracy of rural road extraction. We employ a Stable Diffusion model for data augmentation, generating diverse training samples and providing a new method for acquiring remote sensing images. TCNet integrates the clustered contextual Transformer (CCT) module, clustered cross-attention (CCA) module, and CBAM attention mechanism to ensure efficient model transferability across different geographical and climatic conditions. Moreover, we design a new loss function, the Dice-BCE-Lovasz loss (DBL loss), to accelerate convergence and improve segmentation performance in handling imbalanced data. Experimental results demonstrate that TCNet, with only 23.67 M parameters, performs excellently on the DeepGlobe and road datasets and shows outstanding transferability in zero-shot testing on rural remote sensing data. TCNet performs well on segmentation tasks without any fine-tuning for regions such as Burgundy, France, and Yunnan, China.

  • Research Article
  • 10.3390/rs17132254
Synthesizing Remote Sensing Images from Land Cover Annotations via Graph Prior Masked Diffusion
  • Jun 30, 2025
  • Remote Sensing
  • Kai Deng + 4 more

Semantic image synthesis (SIS) in remote sensing aims to generate high-fidelity satellite imagery from land use/land cover (LULC) labels, supporting applications such as map updating, data augmentation, and environmental monitoring. However, the existing methods typically focus on pixel-level semantic-to-image translation, neglecting the spatial and semantic relationships among land cover objects, which hinders accurate scene structure modeling. To address this challenge, we propose GMDiT, an enhanced conditional diffusion model that extends the masked DiT architecture with graph-prior modeling. By jointly incorporating relational graph structures and semantic labels, GMDiT explicitly captures the object-level spatial and semantic dependencies, thereby improving the contextual coherence and structural fidelity of the synthesized images. Specifically, to effectively capture inter-object dependencies, we first encode the semantics of each node using CLIP and then employ a simple yet effective graph transformer to model the spatial interactions among nodes. Additionally, we design a scene similarity sampling strategy for the reverse diffusion process, improving contextual alignment while maintaining generative diversity. Experiments on the OpenEarthMap dataset show that GMDiT achieves superior performance in terms of FID and other metrics, demonstrating its effectiveness and robustness in the generation of structured remote sensing images.

  • Conference Article
  • 10.1109/cisat66811.2025.11181898
Physics-based Diffusion Model for Joint Cloud and Haze Removal in Remote Sensing Images
  • Jul 11, 2025
  • Cheng Ma + 1 more

Remote sensing images are often affected by cloud and haze, which degrade surface observation accuracy. Existing deep learning methods focus on removing cloud or haze independently, lacking a unified approach for simultaneous processing. To address this limitation, we propose a physics-based diffusion model for joint cloud and haze removal from remote sensing images. Given the interrelated nature of cloud and haze and the challenges in decoupling them with existing physical models, our approach combines the atmospheric scattering model and cloud degradation model to construct a unified framework. A Transformer network learns the physical parameters for initial removal, and these parameters are integrated into a diffusion model, enhanced with a structure extraction module to refine spatial details. Experiments on remote sensing datasets show the effectiveness of our method. The code is available at https://github.com/Mccc1003/Decloud-main.

  • Research Article
  • 10.1109/tgrs.2025.3592738
AFLDiff: Adaptive Frequency-Aware Latent Diffusion Model for Road Extraction in Remote Sensing Images
  • Jan 1, 2025
  • IEEE Transactions on Geoscience and Remote Sensing
  • Ruiqi Liu + 6 more

Accurate road extraction from remote sensing images is of great importance, yet the complex backgrounds, significant scale differences, and detailed characteristics of road information pose a tremendous challenge to the semantic modeling capabilities of deep learning methods. Recent studies have shown impressive success with diffusion-based methods in semantic segmentation. However, when applied to road extraction tasks, the remarkable performance can be diminished by intricate terrains and unreasonable feature fusion. In this article, we reframe road extraction as a mask-generating task, progressively refining road mask predictions through a timestep-by-timestep denoising process. Specifically, we propose a novel adaptive frequency-aware latent diffusion framework named AFLDiff. The pretrained VQ-VAE encoder with coupled perceptual compression suppresses semantic features that are weakly associated with roads in remote sensing images, reconstructing the image into a low-dimensional latent space for training. The adaptive frequency-aware module filters out redundant noise from the noisy features and enhances useful high- and low-frequency components, bridging the gap between noisy and semantic features for better fusion. Extensive experimental results demonstrate that our method achieves state-of-the-art (SOTA) performance on the DeepGlobe, Massachusetts, and CHN6-CUG datasets, with cross-dataset validation further confirming its robustness and generalization capabilities.

Save Icon
Up Arrow
Open/Close