Deep Fusion of Visible and Near Infrared Images for Registration and Defogging Using Cross Modal Transformer

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

According to Rayleigh scattering, the density of fog is closely related to the distance of the target from the visible light camera. However, the near infrared (NIR) spectral band is robust to the interference problem but has a single channel without color. In this paper, we propose deep fusion of visible (VIS) and near infrared (NIR) images for registration and defogging using cross modal transformer. The proposed fusion network has two main features: 1) Cross modal transformer deals with the channel and spatial redundancy between two source images while retaining their complementary information; 2) Since the blue channel in VIS image can represent the amount of fog in the scene according to Rayleigh scattering, we normalize it to obtain a fog weight map. To synthesize training data, the distance between object and camera is required, thus we use depth estimation to obtain it. Then, we assign different amounts of fog to each image to consider various weather conditions. First, we extract multi-scale features from the source images. Second, we use the cross modal transformer to reflect cross-domain interaction between the source images in spatial and channel domains. Third, we generate the fusion weight for the source images according to the fog weight map and reconstruct a fused result based on it. Various experiments on the widely-used benchmark and real-world datasets show that the proposed method generates natural-looking defogging results with clear textures in distant regions while keeping the original tone in near regions. Moreover, the proposed method outperforms state-of-the-art fusion methods in term of both visual quality and quantitative measurements.

Similar Papers
  • Research Article
  • 10.3390/diagnostics15192484
Optimising Multimodal Image Registration Techniques: A Comprehensive Study of Non-Rigid and Affine Methods for PET/CT Integration
  • Sep 28, 2025
  • Diagnostics
  • Babar Ali + 5 more

Background/Objective: Multimodal image registration plays a critical role in modern medical imaging, enabling the integration of complementary modalities such as positron emission tomography (PET) and computed tomography (CT). This study compares the performance of three widely used image registration techniques—Demons Image Registration with Modality Transformation, Free-Form Deformation using the Medical Image Registration Toolbox (MIRT), and MATLAB Intensity-Based Registration—in terms of improving PET/CT image alignment. Methods: A total of 100 matched PET/CT image slices from a clinical scanner were analysed. Preprocessing techniques, including histogram equalisation and contrast enhancement (via imadjust and adapthisteq), were applied to minimise intensity discrepancies. Each registration method was evaluated under varying parameter conditions with regard to sigma fluid (range 4–8), histogram bins (100 to 256), and interpolation methods (linear and cubic). Performance was assessed using quantitative metrics: root mean square error (RMSE), mean squared error (MSE), mean absolute error (MAE), the Pearson correlation coefficient (PCC), and standard deviation (STD). Results: Demons registration achieved optimal performance at a sigma fluid value of 6, with an RMSE of 0.1529, and demonstrated superior computational efficiency. The MIRT showed better adaptability to complex anatomical deformations, with an RMSE of 0.1725. MATLAB Intensity-Based Registration, when combined with contrast enhancement, yielded the highest accuracy (RMSE = 0.1317 at alpha = 6). Preprocessing improved registration accuracy, reducing the RMSE by up to 16%. Conclusions: Each registration technique has distinct advantages: the Demons algorithm is ideal for time-sensitive tasks, the MIRT is suited to precision-driven applications, and MATLAB-based methods offer flexible processing for large datasets. This study provides a foundational framework for optimising PET/CT image registration in both research and clinical environments.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/isbi.2016.7493461
Demon registration of OCT and histology images through edge orientation-weighted modality transformation
  • Apr 1, 2016
  • Smruti Rekka + 1 more

Registration between histology and ex-vivo OCT can help in many clinical applications including identifying surgical margin in a tumour tissue during intra-operative pathological diagnosis, quantifying features in OCT for diagnosis and reduction of tissue processing time. In this paper, we present a framework for non-rigid registration between Histology and ex-vivo full field OCT images. The proposed framework consists of a two-stage registration process. The first step consists of large-scale misalignment correction while also establishing the match between OCT and one of the several histology tissue samples based on iterative closest point of prominent edge points. The second step consists of Demon registration based on modality transformation. A modification to modality transformation [1] which preserves the sharpness of the edges in the resultant transformation is also suggested here. This is seen to help in faster convergence of demon registration.

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.jag.2022.102778
Feature fusion-based registration of satellite images to airborne LiDAR bathymetry in island area
  • May 1, 2022
  • International Journal of Applied Earth Observation and Geoinformation
  • Xue Ji + 4 more

Feature fusion-based registration of satellite images to airborne LiDAR bathymetry in island area

  • Conference Article
  • Cite Count Icon 147
  • 10.1109/isbi.2009.5193214
MRI modalitiy transformation in demon registration
  • Jun 1, 2009
  • Dirk-Jan Kroon + 1 more

Nonrigid local image registration plays an important role in medical imaging. In this paper we focus on demon registration which is introduced by Thirion [1], and is comparable to fluid registration. Because demon registration cannot deal with multiple MRI modalities, we introduce a MRI modality transformation which changes the representation of a T1 scan into a T2 scan using the peaks in a joint histogram. We compare the performance between demon registration with modality transformation, demon registration with gradient images and Rueckerts [2] B-spline based free form deformation method in combination with mutual information. For this test we use perfectly aligned T1 and T2 slices from the BrainWeb database [3], which we local spherically distort. In conclusion demon registration with modality transformation gives the smallest registration errors, in case of local large spherical distortions and small bias fields.

  • Research Article
  • Cite Count Icon 63
  • 10.1109/lgrs.2017.2783879
PSOSAC: Particle Swarm Optimization Sample Consensus Algorithm for Remote Sensing Image Registration
  • Feb 1, 2018
  • IEEE Geoscience and Remote Sensing Letters
  • Yue Wu + 4 more

Image registration is an important preprocessing step for many remote sensing image processing applications, and its result will affect the performance of the follow-up procedures. Establishing reliable matches is a key issue in point matching-based image registration. Due to the significant intensity mapping difference between remote sensing images, it may be difficult to find enough correct matches from the tentative matches. In this letter, particle swarm optimization (PSO) sample consensus algorithm is proposed for remote sensing image registration. Different from random sample consensus (RANSAC) algorithm, the proposed method directly samples the modal transformation parameter rather than randomly selecting tentative matches. Thus, the proposed method is less sensitive to the correct rate than RANSAC, and it has the ability to handle lower correct rate and more matches. Meanwhile, PSO is utilized to optimize parameter as its efficiency. The proposed method is tested on several multisensor remote sensing image pairs. The experimental results indicate that the proposed method yields a better registration performance in terms of both the number of correct matches and aligning accuracy.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/mlccim60412.2023.00020
An Infrared and Visible Image Registration Network Based on Modal Transformation
  • Jul 25, 2023
  • Lang Cheng + 3 more

Image registration is an important prerequisite for image fusion, and its accuracy will affect the quality of fused images. Visible images are enhanced with detailed textures, but their quality is heavily influenced by lighting conditions; infrared images are not affected by illumination and distance, which makes the two complement-ary images. However, the disparate imaging mechanism leads to a great difference between the two images, so it is difficult to register them. This paper introduces MTIVRNet, a network for registering infrared and visible images using modal transformation as its foundation. CycleGAN is a framework used to train two sets of images, such as infrared and visible images, by transforming one modality to another. In this case, it is used to convert visible images into pseudo-infrared images. The aim is to bridge the spectral difference and make the images appear similar in both modalities. Then, the OD-Dense module proposed in this paper is used for feature extraction, which is mainly composed of Omni-dimensional Dynamic Convolution (ODConv). By utilizing ODConv, the model benefits from enhanced feature extraction capabilities without significantly increasing the number of parameters. The experimental results show that the evaluation metric PCK of MTIVRN et on the infrared and visible images dataset FLIR is 9.48% higher than the basic network when α=0.1, and superior to other comparison methods in vision.

  • Research Article
  • 10.3390/sym13060929
Regional Localization of Mouse Brain Slices Based on Unified Modal Transformation
  • May 24, 2021
  • Symmetry
  • Songwei Wang + 7 more

Brain science research often requires accurate localization and quantitative analysis of neuronal activity in different brain regions. The premise of related analysis is to determine the brain region of each site on the brain slice by referring to the Allen Reference Atlas (ARA), namely the regional localization of the brain slice. The image registration methodology can be used to solve the problem of regional localization. However, the conventional multi-modal image registration method is not satisfactory because of the complexity of modality between the brain slice and the ARA. Inspired by the idea that people can automatically ignore noise and establish correspondence based on key regions, we proposed a novel method known as the Joint Enhancement of Multimodal Information (JEMI) network, which is based on a symmetric encoder–decoder. In this way, the brain slice and the ARA are converted into a segmentation map with unified modality, which greatly reduces the difficulty of registration. Furthermore, combined with the diffeomorphic registration algorithm, the existing topological structure was preserved. The results indicate that, compared with the existing methods, the method proposed in this study can effectively overcome the influence of non-unified modal images and achieve accurate and rapid localization of the brain slice.

  • Research Article
  • Cite Count Icon 5
  • 10.1007/s10489-025-06232-8
Cyclic deformable medical image registration with prompt: deep fusion of diffeomorphic and transformer methods
  • Jan 11, 2025
  • Applied Intelligence
  • Longhao Li + 5 more

Cyclic deformable medical image registration with prompt: deep fusion of diffeomorphic and transformer methods

Save Icon
Up Arrow
Open/Close