Articles published on Iterative Fusion
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
136 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.neucom.2026.133351
- Jun 1, 2026
- Neurocomputing
- Licai Zhang + 5 more
FMIN: A flexible multimodal iterative fusion network with geometry-aware positive noise alignment for drug–target interaction prediction
- Research Article
- 10.3390/sym18050729
- Apr 24, 2026
- Symmetry
- Danfeng Zuo + 5 more
Ship target detection is a prerequisite for achieving automated monitoring in ship detection systems. To address the challenge of accurately detecting ship targets in complex water environments, this study proposes a ship target detection method based on an improved YOLOv11 framework. To enhance the model’s ability to perceive and fuse features across multiple scales and in complex backgrounds, an Iterative Attention Feature Fusion (iAFF) module and a Biformer module are integrated at the end of the backbone network. The iAFF module iteratively optimizes multi-scale features through a two-stage attention mechanism, effectively focusing on key target regions, thereby improving the model’s detection capability for small, medium-sized, and occluded ships. The Biformer module leverages its innovative Bi-level Routing Attention (BRA) mechanism to enhance the modeling of global semantic information while reducing computational complexity, mitigating false detections caused by occlusions among ship targets, and consequently improving detection precision. This study employs the Minimum Point Distance Intersection over Union (MPDIoU) loss function, which more comprehensively measures the similarity between predicted and ground-truth bounding boxes by optimizing the distances of their key geometric points, effectively enhancing the accuracy of bounding box regression. Experimental results show that the proposed model achieved 93.96% mAP, 92.93% recall, and 94.97% precision on a self-built ship dataset, surpassing mainstream detection algorithms including YOLOv11 in multiple metrics. The model has only 2.90 M parameters, achieving a good balance between accuracy and efficiency. This provides an accurate and efficient solution for intelligent ship supervision.
- Research Article
- 10.3390/rs18071003
- Mar 27, 2026
- Remote Sensing
- Binjie Zhang + 4 more
Object detection in unmanned aerial vehicle (UAV) imagery remains a crucial yet challenging task due to complex backgrounds, large scale variations, and the prevalence of small objects. Visible-spectrum images lack robustness under all-weather and all-illumination conditions; by contrast, multispectral sensing provides complementary cues (e.g., thermal signatures) that improve detection robustness. However, existing multispectral solutions often incur high computational costs and are therefore difficult to deploy on resource-constrained UAV platforms. To address these issues, SG-YOLO is proposed, a lightweight and efficient multispectral object detection framework that aims to balance accuracy and efficiency. First, a Spectral Gated Downsampling Stem (SGDS) is designed, in which grouped convolutions and a gating mechanism are employed at the early stage of the network to extract band-specific features, thereby maximizing spectral complementarity while minimizing redundancy. Second, a Spectral–Spatial Iterative Attention Fusion (SSIAF) module is introduced, in which spectral-wise (channel) attention and spatial-wise attention are iteratively coupled and cascaded in a multi-scale manner to jointly model cross-band dependencies and spatial saliency, thereby aggregating high-level semantic information while suppressing redundant spectral responses. Finally, a Spatial–Channel Synergistic Fusion (SCSF) module is designed to enhance multi-scale and cross-channel feature integration in the neck. Experiments on the MODA dataset show that SG-YOLOs achieves 72.4% mAP50, outperforming the baseline by 3.2%. Moreover, compared with a range of mainstream one-stage detectors and multispectral detection methods, SG-YOLO delivers the best overall performance, providing an effective solution for UAV object detection while maintaining a favorable trade-off between model size and detection accuracy.
- Research Article
- 10.1016/j.inffus.2025.103824
- Mar 1, 2026
- Information Fusion
- Pengfei Wei + 5 more
How people read? Reading preference-inspired multimodal NER with heterogeneous mining and iterative fusion engine
- Research Article
- 10.1109/tmi.2026.3660270
- Feb 3, 2026
- IEEE transactions on medical imaging
- Qingsen Bao + 7 more
Integrating multimodal radiological images and clinical data is critical for survival prediction in rectal cancer. However, existing methods often lack sufficient consideration of 1) modality heterogeneity (caused by rectal peristalsis, noise artifacts, and missing modalities) and 2) site heterogeneity (caused by different imaging protocols and patient populations). These factors hinder the model from capturing reliable cross-modal relationships and adapting to distribution shifts across clinical sites. In this work, we propose UICSurv, a novel multimodal Survival prediction framework highlighted by Uncertainty-guided Iterative Contrastive fusion, to capture robust cross-site multimodal interactions while leveraging sample-level uncertainty to enhance fusion reliability. Specifically, UICSurv initializes a shared multimodal embedding and iteratively refines it by fusing each heterogeneous modality via the cross-attention mechanism. In each iteration, a novel Survival Contrastive Learning (SCL) strategy is designed to progressively enhance both cross-site alignment and survival discriminability of the multimodal embedding space. Moreover, we design an EvidenceHit module, which employs temporally consistent evidential learning to jointly estimate survival probabilities and uncertainty. The estimated uncertainty further guides the embedding alignment by reducing the interference of unreliable samples. All components operate synergistically within UICSurv to reinforce reliable survival prediction in rectal cancer. Extensive experiments on multimodal datasets of rectal cancer (collected from three sites) demonstrate the superiority of our method both in survival prediction and uncertainty estimation. The code is available open-source: https://github.com/ScorpioBao/UICSurv.
- Research Article
- 10.3390/sym18020222
- Jan 25, 2026
- Symmetry
- Weipan Wang + 4 more
Industrial-grade printed circuit boards (PCBs) exhibit high structural order and inherent geometric symmetry, where minute surface defects essentially constitute symmetry-breaking anomalies that disrupt topological integrity. Detecting these anomalies is quite challenging due to issues like scale variation and low contrast. Therefore, this paper proposes a symmetry-aware object detection framework, DAS-YOLO, based on an improved YOLOv11. The U-shaped adaptive feature extraction module (Def-UAD) reconstructs the C3K2 unit, overcoming the geometric limitations of standard convolutions through a deformation adaptation mechanism. This significantly enhances feature extraction capabilities for irregular defect topologies. A semantic-aware module (SADRM) is introduced at the backbone and neck regions. The lightweight and efficient ESSAttn improves the distinguishability of small or weak targets. At the same time, to address information asymmetry between deep and shallow features, an iterative attention feature fusion module (IAFF) is designed. By dynamically weighting and calibrating feature biases, it achieves structured coordination and balanced multi-scale representation. To evaluate the validity of the proposed method, we carried out comprehensive experiments using publicly accessible datasets focused on PCB defects. The results show that the Recall, mAP@50, and mAP@50-95 of DAS-YOLO reached 82.60%, 89.50%, and 46.60%, respectively, which are 3.7%, 1.8%, and 2.9% higher than those of the baseline model, YOLOv11n. Comparisons with mainstream detectors such as GD-YOLO and SRN further demonstrate a significant advantage in detection accuracy. These results confirm that the proposed framework offers a solution that strikes a balance between accuracy and practicality in addressing the key challenges in PCB surface defect detection.
- Research Article
- 10.1109/tim.2026.3671930
- Jan 1, 2026
- IEEE Transactions on Instrumentation and Measurement
- Yun Tong + 4 more
With the increasing complexity of mechanical equipment, fault features often exhibit both slow-varying and fast-varying components simultaneously. These fault features are challenging to characterize synchronously, which negatively impacts the precision of fault diagnosis. Hence, this paper presents a novel time-frequency analysis approach termed iterative self-estimated chirplet fusion bidirectional extracting transform (ISCF-BET). First, to improve the accuracy of subsequent time-frequency representation (TFR), an iterative self-estimated chirplet fusion (ISCF) based on chirplet transform is proposed. ISCF by estimating time-frequency chirp rate parameter point-by-point on the time-frequency plane and fusing multiple chirplet transform results, generates the initial TFR and mitigates amplitude distortion. Second, we develop a bidirectional extracting transform (BET) method for fine-grained processing of the two types of features. BET distinguishes slow-varying and fast-varying features in the initial TFR through time-frequency chirp rate parameter. Then, it conducts targeted extraction in the time and frequency directions, respectively, significantly enhancing the time-frequency energy concentration and resolution. Finally, simulation outcomes and practical engineering applications demonstrate that the proposed method serves as a reliable solution for mechanical fault diagnosis.
- Research Article
- 10.1088/2040-8986/ae32a5
- Jan 1, 2026
- Journal of Optics
- Gaowei Sun + 3 more
Abstract Phase unwrapping is one of the key problems in fringe projection profilometry (FPP). These years, CNN-based and Transformer-based models are widely used in FPP phase unwrapping. Unfortunately, the limitation of CNNs in long-range modeling capabilities prevent them from effectively extracting fine-grained features in wrapping phase images, while Transformer-based models struggle with efficiently handling long-range dependencies due to their local focus or computational demands. Recent studies demonstrate that Mamba, a novel selective state space model (SSM), achieves efficient modeling of long-range dependencies by dynamically adapting its parameters based on input context, particularly in tasks such as long-sequence modeling. Inspired by this, we propose a hybrid deep learning model integrating Res-UNet with Mamba (CNN-Mamba network, CMNet) for phase unwrapping of single-frame fringe patterns. First, a new dual-branch skip connection module based on Mamba is proposed named convolution weighted feature fusion SSM (Conv_WFF-SSM) is proposed, integrating Mamba’s selective state space mechanisms with multi-scale convolutional feature weighting to simultaneously address long-range interaction modeling and hierarchical feature preservation during phase expansion. Second, a parameter-free attention module (PFAM) is introduced into the encoder and decoder to reduce information loss caused by downsampling and upsampling operations without increasing network parameters. Finally, the iterative attentional feature fusion (IAFF) module integrated into residual block to instead of the ordinary sum operation for the first time. Experiments demonstrate the validity and robustness of the proposed technique.
- Research Article
- 10.1109/tmm.2026.3668556
- Jan 1, 2026
- IEEE Transactions on Multimedia
- Jintao Huang + 2 more
Multi-instance partial multi-label learning (MIPML) addresses a challenging scenario wherein each training sample comprises a bag of multiple instances associated with a candidate label set comprising several true labels alongside noisy labels simultaneously. Current MIPML methods typically neglect the essential correlations between labels and instances at both the instance and bag levels, which limits their effectiveness in disambiguation and predictive accuracy. To address these limitations, we present Correlation-Fusion MIPML (CF-MIPML), an innovative framework that integrates Label Confidence Generation (LCG) and Candidate Label Disambiguation (CLD). The LCG module systematically constructs a robust label confidence matrix by capturing correlation structures within and across bags, thereby providing a foundation for precise label disambiguation. The CLD module utilizes the comprehensive confidence matrix to further improve label predictions, utilizing an optimized iterative fusion loss function that incorporates partial loss and interaction loss. This joint-loss strategy enables ongoing refinement of label confidence during training, thereby improving the robustness and accuracy of predictions. Comprehensive experimental results on various benchmark and real-world datasets demonstrate that CF-MIPML outperforms existing state-of-the-art methods, enhancing handling of complex label ambiguity and improving overall model generalization in practical MIPML scenarios.
- Research Article
- 10.3390/s25237375
- Dec 4, 2025
- Sensors (Basel, Switzerland)
- Jaemyung Kim + 1 more
Recently, the convergence of advanced sensor technologies and innovations in artificial intelligence and robotics has highlighted facial emotion recognition (FER) as an essential component of human-computer interaction (HCI). Traditional FER studies based on handcrafted features and shallow machine learning have shown a limited performance, while convolutional neural networks (CNNs) have improved nonlinear emotion pattern analysis but have been constrained by local feature extraction. Vision transformers (ViTs) have addressed this by leveraging global correlations, yet both CNN- and ViT-based single networks often suffer from overfitting, single-network dependency, and information loss in ensemble operations. To overcome these limitations, we propose ArecaNet, an assembled residual enhanced cross-attention network that integrates multiple feature streams without information loss. The framework comprises (i) channel and spatial feature extraction via SCSESResNet, (ii) landmark feature extraction from specialized sub-networks, (iii) iterative fusion through residual enhanced cross-attention, (iv) final emotion classification from the fused representation. Our research introduces a novel approach by integrating pre-trained sub-networks specialized in facial recognition with an attention mechanism and our uniquely designed main network, which is optimized for size reduction and efficient feature extraction. The extracted features are fused through an iterative residual enhanced cross-attention mechanism, which minimizes information loss and preserves complementary representations across networks. This strategy overcomes the limitations of conventional ensemble methods, enabling seamless feature integration and robust recognition. The experimental results show that the proposed ArecaNet achieved accuracies of 97.0% and 97.8% using the public databases, FER-2013 and RAF-DB, which were 4.5% better than the existing state-of-the-art method, PAtt-Lite, for FER-2013 and 2.75% for RAF-DB, and achieved a new state-of-the-art accuracy for each database.
- Research Article
3
- 10.1109/tgcn.2025.3543476
- Dec 1, 2025
- IEEE Transactions on Green Communications and Networking
- Jiatong Bai + 6 more
Most existing DOA estimation methods assume ideal source incident angles with minimal noise. Moreover, directly using pre-estimated angles to calculate weighted coefficients can lead to performance loss. Thus, a green multi-modal (MM) fusion DOA framework is proposed to realize a more practical, low-cost and high time-efficiency DOA estimation for a H2AD array. Firstly, two more efficient clustering methods, global maximum cos_similarity clustering (GMaxCS) and global minimum distance clustering (GMinD), are presented to infer more precise true solutions from the candidate solution sets. Based on this, an iteration weighted fusion (IWF)-based method is introduced to iteratively update weighted fusion coefficients and the clustering center of the true solution classes by using the estimated values. Particularly, the coarse DOA calculated by fully digital (FD) subarray, serves as the initial cluster center. The above process yields two methods called MM-IWF-GMaxCS and MM-IWF-GMinD. To further provide a higher-accuracy DOA estimation, a fusion network (fusionNet) is proposed to aggregate the inferred two-part true angles and thus generates two effective approaches called MM-fusionNet-GMaxCS and MM-fusionNet-GMinD. The simulation outcomes show the proposed four approaches can achieve the ideal DOA performance and the CRLB. Meanwhile, proposed MM-fusionNet-GMaxCS and MM-fusionNet-GMinD exhibit superior DOA performance compared to MM-IWF-GMaxCS and MM-IWF-GMinD, especially in extremely-low SNR range.
- Research Article
1
- 10.1088/1361-6501/ae1858
- Nov 7, 2025
- Measurement Science and Technology
- Zhiqin Zhang + 5 more
Abstract Insulators are critical components in transmission lines. Common defects, such as structural loss of the insulator caused by spontaneous rupture, breakage, and fouling can lead to short circuits and tripping faults, posing serious threats to power grid stability and the safety of the power supply. However, in practical applications, insulator defect detection faces several challenges, including small target sizes, insufficient representation of multiscale features, complex backgrounds, and imbalanced datasets with a limited number of defective samples. Traditional detection methods often struggle with missed detections of small targets and lack robustness in scenarios with large-scale variations and complex environments. To address these issues, this paper proposes an enhanced detection model based on YOLOv8s. The model introduces an Iterative Attentional Feature Fusion (iAFF) module to optimize multiscale feature representation and incorporates a Generalized Dynamic Feature Pyramid Network (GDFPN) to improve feature retention for small target detection, thereby enhancing robustness in complex backgrounds. Additionally, to mitigate the problem of limited defective sample data, the Stable Diffusion generative model is utilized to augment the dataset, effectively improving detection performance in small-sample scenarios. Experimental results demonstrate that the proposed method significantly outperforms the original YOLOv8s model in terms of recall, accuracy, and precision on the insulator defect dataset. The model exhibits strong detection capabilities and generalization performance, making it well-suited for real-world challenges such as small targets, multiscale variation, and complex backgrounds.
- Research Article
5
- 10.1016/j.knosys.2025.114313
- Nov 1, 2025
- Knowledge-Based Systems
- Chenyoukang Lin + 3 more
CTIUFuse: A CNN-Transformer-based iterative feature universal fusion algorithm for multimodal images
- Research Article
3
- 10.1038/s41598-025-15274-4
- Oct 15, 2025
- Scientific reports
- Depeng Wang + 3 more
Due to the scalability issues of transformers and the limitations of CNN's lack of typical inductive bias, their applications in a wider range of fields are somewhat restricted. Therefore, the hybrid network architecture that combines the advantages of convolution and Transformer is gradually becoming a hot research and application direction. This article proposes an enhanced dual encoder network (EDE-Net) that integrates convolution and pyramid transformers for medical image segmentation. Specifically, we apply convolutional kernels and pyramid transformer structures in parallel in the encoder to extract features, ensuring that the network can capture local details and global semantic information. To efficiently fuse local details information and global features at each downsampling stage, we introduce the phase-based iterative feature fusion module (PIFF). The PIFF module first combines local details and global features and then assigns distinct weight coefficients to each, distinguishing their importance for foreground pixel classification. By effectively balancing the significance of local details and global features, the PIFF module enhances the network's ability to delineate fine lesion edges. Experimental results on the GlaS and MoNuSeg datasets validate the effectiveness of this approach. On these two publicly available datasets, our EDE-Net significantly outperforms previous CNN-based (such as UNet) and transformer-based (such as Swin-UNet) algorithms.
- Research Article
- 10.1016/j.comcom.2025.108245
- Sep 1, 2025
- Computer Communications
- Beiming Yan + 6 more
Crowd counting with WiFi sensing based on iterative attentional feature fusion
- Research Article
13
- 10.1109/tmi.2024.3494271
- Sep 1, 2025
- IEEE transactions on medical imaging
- Shuo Han + 7 more
Cardiac computed tomography (CT) has emerged as a major imaging modality for the diagnosis and monitoring of cardiovascular diseases. High temporal resolution is essential to ensure diagnostic accuracy. Limited-angle data acquisition can reduce scan time and improve temporal resolution, but typically leads to severe image degradation and motivates for improved reconstruction techniques. In this paper, we propose a novel physics-informed score-based diffusion model (PSDM) for limited-angle reconstruction of cardiac CT. At the sampling time, we combine a data prior from a diffusion model and a model prior obtained via an iterative algorithm and Fourier fusion to further enhance the image quality. Specifically, our approach integrates the primal-dual hybrid gradient (PDHG) algorithm with score-based diffusion models, thereby enabling us to reconstruct high-quality cardiac CT images from limited-angle data. The numerical simulations and real data experiments confirm the effectiveness of our proposed approach.
- Research Article
- 10.1177/09544054251359468
- Aug 26, 2025
- Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture
- Peng Zhao + 2 more
In the automatic assembly of temperature-differential method for the hole-shaft interference fit structure, due to the huge temperature difference between the low temperature and the room temperature environment, the surface of the hole-shaft parts is prone to the formation of a frost layer, which seriously affects the visual measurement accuracy of the assembly pose. To address this problem, this paper draws on the research idea of image dehazing and proposes a Single Image Edge-Enhanced Defrosting network (SIEED) based on the coding-decoding structure to realize efficient defrosting from the image level. SIEED comprises the following key modules: the Edge-Enhanced Convolution Module (EECM), which leverages the sensitivity of convolution operators to edge features, enhancing edge information extraction; the Spatial-Guided Attention Module (SGAM), which employs local sensing techniques to address the non-uniform frost distribution through regional differentiation; the Weight-Based Iterative Fusion Module (WIFM), which dynamically fuses shallow and deep features to mitigate the loss of low-frequency features induced by deep convolution; and the Adversarial Discrimination Module (ADM), which incorporates global and local discriminators to balance the realism of localized defrosting with the overall coherence, using adversarial generation to produce defrosting images closer to reality. In addition, this paper proposes an automated acquisition method for the real dataset of hole-shaft images to guarantee the reliability and practicality of model training. The experimental results show that SIEED exhibits excellent performance in the frost-covered image defrosting task, and the pose measurements of its reconstructed images are highly close to those of the clean images, which fully verifies the validity, and reliability of the method in practical applications.
- Research Article
5
- 10.1088/1361-6501/adf90e
- Aug 14, 2025
- Measurement Science and Technology
- Zeyu Jiang + 5 more
Abstract Rotating machinery constitutes a crucial component in modern industrial production, and advanced fault diagnosis technologies are vital for ensuring its safe and reliable operation. Most existing data-driven fault diagnosis frameworks for rotating machinery are designed under conditions of balanced data. However, in practical applications, the amount of data collected under fault conditions is much less than that in normal operating conditions, presenting substantial challenges for accurate fault diagnosis. This paper presents a diffusion-assisted framework to address the challenge of highly imbalanced data, aiming to improve fault diagnosis accuracy and reliability in real-world industrial applications. Firstly, a novel diffusion model-assisted signal generation model is proposed to augment the data in faulty states. This model employs a cooperative modulation strategy and signal filtering techniques to improve the quality of the generated signals. Subsequently, an enhanced pure convolutional network, termed IConvNeXt, incorporates pyramidal feature integration for robust classification based on the generated virtual data. The IConvNeXt employs depthwise separable convolution techniques and introduces an iterative attention-based feature fusion module to fuse features from different stages of the network adaptively. Finally, extensive experiments are conducted using two rotating machinery datasets to validate the performance of the proposed diagnostic framework under highly data-imbalance conditions. The results demonstrate that the proposed method significantly enhances the discriminative capability for minority classes, leading to notable improvements in both diagnostic accuracy and F1-score.
- Research Article
- 10.1007/s11517-025-03426-7
- Aug 13, 2025
- Medical & biological engineering & computing
- Shuang Liu + 3 more
Accurate segmentation of hard exudate in fundus images is crucial for early diagnosis of retinal diseases. However, hard exudate segmentation is still a challenge task for accurately detecting small lesions and precisely locating the boundaries of ambiguous lesions. In this paper, the longitudinal multi-scale fusion network (LMSF-Net) is proposed for accurate hard exudate segmentation in fundus images. In this network, an adjacent complementary correction module (ACCM) is proposed on the encoding path for complementary fusion between adjacent encoding features, and a progressive iterative fusion module (PIFM) is designed on the decoding path for fusion between adjacent decoding features. Furthermore, a spatial awareness fusion module (SAFM) is proposed at the end of the decoding path for calibration and aggregation of the two decoding outputs. The proposed method can improve segmentation results of hard exudates with different scales and shapes. The experimental results confirm the superiority of the proposed method for hard exudate segmentation with AUPR of 0.6954, 0.9017, and 0.6745 on the DDR, IDRID, and E-Ophtha EX datasets, respectively.
- Research Article
1
- 10.1080/17538947.2025.2543568
- Aug 13, 2025
- International Journal of Digital Earth
- Qixin Hu + 2 more
ABSTRACT Satellite-derived bathymetry (SDB) is increasingly being recognized as a cost-effective technique for acquiring extensive and high-resolution bathymetric data through the inversion of passive optical remote sensing imagery. This study introduces a novel SDB method based on iterative fusion of active lidar photons and passive pseudo-photons (SDB-IFAP), which seeks to enhance bathymetric retrieval by integrating active lidar data with passive imagery, thereby enriching studies on the interactions between these systems. Effectiveness of the SDB-IFAP was evaluated by comparison with in situ data of four areas, which indicated that SDB-IFAP achieved a root mean square error (RMSE) of 0.59–0.84 m and a mean absolute error (MAE) of 0.45–0.66 m in detecting active lidar photons. Furthermore, the accuracy of the bathymetry retrieved through SDB-IFAP was notably high, with RMSE of 0.85–1.40 m and MAE of 0.60–1.06 m. The success of SDB-IFAP is attributed to the calibration derived from active lidar and the contribution of passive imagery, suggesting that advancements in spaceborne sensor technology will further enhance the capabilities of the SDB.