Semi-supervised segmentation of forest fires from UAV remote sensing images via panoramic feature fusion and pixel contrastive learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Introduction Wildfire detection and segmentation play a critical role in environmental monitoring and disaster prevention. However, existing deep learning-based segmentation models often struggle to identify wildfire boundaries accurately due to complex image features and limited annotated data. Methods We propose a novel segmentation network called PPCNet, which integrates three key modules: a Panoramic Feature Fusion (PFF) module for multi-scale feature extraction, a Dense Feature Fusion Encoder (DFFE) to capture contextual details, and a Local Detail Compensation (LDC) loss function to enhance boundary accuracy. Additionally, we design a pseudo-label optimization framework to leverage unlabeled data effectively. Results Experiments were conducted on multiple wildfire datasets, and the results show that PPCNet achieves superior performance compared to state-of-the-art methods. Our model demonstrates significant improvements in segmentation accuracy and boundary localization, validated through quantitative metrics and visual comparisons. Discussion The integration of PFF, DFFE, and LDC components enables PPCNet to generalize well across different wildfire scenarios. The use of pseudo-labeling further enhances performance without requiring additional labeled data, making it suitable for real-world deployment in wildfire monitoring systems.

Similar Papers
  • Research Article
  • Cite Count Icon 8
  • 10.1080/01431161.2023.2261153
MPFFNet: LULC classification model for high-resolution remote sensing images with multi-path feature fusion
  • Oct 2, 2023
  • International Journal of Remote Sensing
  • Hao Yuan + 5 more

Land Use/Land Cover (LULC) classification has become increasingly important in various fields, including ecological and environmental protection, urban planning, and geological disaster monitoring. With the development of high-resolution remote sensing satellite technology, there is a growing focus on achieving precise LULC classification. However, the accuracy of fine-grained LULC classification is challenged by the high intra-class diversity and low inter-class separability inherent in high-resolution remote sensing images. To address this challenge, this paper proposes a novel multi-path feature fusion semantic segmentation model, called MPFFNet, which combines the segmentation results of convolutional neural networks with traditional filtering processes to achieve finer LULC classification. MPFFNet consists of three modules: the Improved Encoder Module (IEM) extracts contextual and spatial detail information through the backbone network, DASPP, and MFEAM; the Improved Decoder Module (IDM) utilizes the Cascade Feature Fusion (CFF) module to effectively merge shallow and deep information; and the Feature Fusion Module (FAM) enables dual-path feature fusion using a convolutional neural network and Gabor Filter. Experimental results on the large-scale classification set and the fine land-cover classification set of the Gaofen Image Dataset (GID) demonstrate the effectiveness of the proposed method, achieving mIoU scores of 81.02% and 77.83%, respectively. These scores outperform U-Net by 7.95% and 3.28%, respectively. Therefore, we believe that our model can deliver superior results in the task of LULC classification.

  • PDF Download Icon
  • Research Article
  • 10.3390/s24144545
SiamDCFF: Dynamic Cascade Feature Fusion for Vision Tracking.
  • Jul 13, 2024
  • Sensors (Basel, Switzerland)
  • Jinbo Lu + 2 more

Establishing an accurate and robust feature fusion mechanism is key to enhancing the tracking performance of single-object trackers based on a Siamese network. However, the output features of the depth-wise cross-correlation feature fusion module in fully convolutional trackers based on Siamese networks cannot establish global dependencies on the feature maps of a search area. This paper proposes a dynamic cascade feature fusion (DCFF) module by introducing a local feature guidance (LFG) module and dynamic attention modules (DAMs) after the depth-wise cross-correlation module to enhance the global dependency modeling capability during the feature fusion process. In this paper, a set of verification experiments is designed to investigate whether establishing global dependencies for the features output by the depth-wise cross-correlation operation can significantly improve the performance of fully convolutional trackers based on a Siamese network, providing experimental support for rational design of the structure of a dynamic cascade feature fusion module. Secondly, we integrate the dynamic cascade feature fusion module into the tracking framework based on a Siamese network, propose SiamDCFF, and evaluate it using public datasets. Compared with the baseline model, SiamDCFF demonstrated significant improvements.

  • Research Article
  • 10.1016/j.compbiomed.2025.110658
A novel recursive transformer-based U-Net architecture for enhanced multi-scale medical image segmentation.
  • Sep 1, 2025
  • Computers in biology and medicine
  • Shanshan Li + 3 more

A novel recursive transformer-based U-Net architecture for enhanced multi-scale medical image segmentation.

  • Research Article
  • 10.21037/qims-24-683
Multiple myeloma segmentation net (MMNet): an encoder-decoder-based deep multiscale feature fusion model for multiple myeloma segmentation in magnetic resonance imaging.
  • Oct 1, 2024
  • Quantitative imaging in medicine and surgery
  • Xin Zhao + 4 more

Patients with multiple myeloma (MM), a malignant disease involving bone marrow plasma cells, shows significant susceptibility to bone degradation, impairing normal hematopoietic function. The accurate and effective segmentation of MM lesion areas is crucial for the early detection and diagnosis of myeloma. However, the presence of complex shape variations, boundary ambiguities, and multiscale lesion areas, ranging from punctate lesions to extensive bone damage, presents a formidable challenge in achieving precise segmentation. This study thus aimed to develop a more accurate and robust segmentation method for MM lesions by extracting rich multiscale features. In this paper, we propose a novel, multiscale feature fusion encoding-decoding model architecture specifically designed for MM segmentation. In the encoding stage, our proposed multiscale feature extraction module, dilated dense connected net (DCNet), is employed to systematically extract multiscale features, thereby augmenting the model's sensing field. In the decoding stage, we propose the CBAM-atrous spatial pyramid pooling (CASPP) module to enhance the extraction of multiscale features, enabling the model to dynamically prioritize both channel and spatial information. Subsequently, these features are concatenated with the final output feature map to optimize segmentation outcomes. At the feature fusion bottleneck layer, we incorporate the dynamic feature fusion (DyCat) module into the skip connection to dynamically adjust feature extraction parameters and fusion processes. We assessed the efficacy of our approach using a proprietary dataset of MM, yielding notable advancements. Our dataset comprised 753 magnetic resonance imaging (MRI) two-dimensional (2D) slice images of the spinal regions from 45 patients with MM, along with their corresponding ground truth labels. These images were primarily obtained from three sequences: T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), and short tau inversion recovery (STIR). Using image augmentation techniques, we expanded the dataset to 3,000 images, which were employed for both model training and prediction. Among these, 2,400 images were allocated for training purposes, while 600 images were reserved for validation and testing. Our method showed increase in the intersection over union (IoU) and Dice coefficients by 7.9 and 6.7 percentage points, respectively, as compared to the baseline model. Furthermore, we performed comparisons with alternative image segmentation methodologies, which confirmed the sophistication and efficacy of our proposed model. Our proposed multiple myeloma segmentation net (MMNet), can effectively extract multiscale features from images and enhance the correlation between channel and spatial information. Furthermore, a systematic evaluation of the proposed network architecture was conducted on a self-constructed, limited dataset. This endeavor holds promise for offering valuable insights into the development of algorithms for future clinical applications.

  • Research Article
  • Cite Count Icon 101
  • 10.1109/tiv.2022.3164899
MTANet: Multitask-Aware Network With Hierarchical Multimodal Fusion for RGB-T Urban Scene Understanding
  • Jan 1, 2023
  • IEEE Transactions on Intelligent Vehicles
  • Wujie Zhou + 3 more

Understanding urban scenes is a fundamental ability requirement for assisted driving and autonomous vehicles. Most of the available urban scene understanding methods use red-green-blue (RGB) images; however, their segmentation performances are prone to degradation under adverse lighting conditions. Recently, many effective artificial neural networks have been presented for urban scene understanding and have shown that incorporating RGB and thermal (RGB-T) images can improve segmentation accuracy even under unsatisfactory lighting conditions. However, the potential of multimodal feature fusion has not been fully exploited because operations such as simply concatenating the RGB and thermal features or averaging their maps have been adopted. To improve the fusion of multimodal features and the segmentation accuracy, we propose a multitask-aware network (MTANet) with hierarchical multimodal fusion (multiscale fusion strategy) for RGB-T urban scene understanding. We developed a hierarchical multimodal fusion module to enhance feature fusion and built a high-level semantic module to extract semantic information for merging with coarse features at various abstraction levels. Using the multilevel fusion module, we exploited low-, mid-, and high-level fusion to improve segmentation accuracy. The multitask module uses boundary, binary, and semantic supervision to optimize the MTANet parameters. Extensive experiments were performed on two benchmark RGB-T datasets to verify the improved performance of the proposed MTANet compared with state-of-the-art methods. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>

  • Research Article
  • 10.1038/s41598-024-85081-w
A lightweight multi scale fusion network for IGBT ultrasonic tomography image segmentation
  • Jan 6, 2025
  • Scientific Reports
  • Meng Song + 5 more

The Insulated Gate Bipolar Transistor (IGBT) is a crucial power semiconductor device, and the integrity of its internal structure directly influences both its electrical performance and long-term reliability. However, the precise semantic segmentation of IGBT ultrasonic tomographic images poses several challenges, primarily due to high-density noise interference and visual distortion caused by target warping. To address these challenges, this paper constructs a dedicated IGBT ultrasonic tomography (IUT) dataset using Scanning Acoustic Microscopy (SAM) and proposes a lightweight Multi-Scale Fusion Network (LMFNet) aimed at improving segmentation accuracy and processing efficiency in ultrasonic images analysis. LMFNet adopts a deep U-shaped encoder-decoder architecture, with the backbone designed using inverted residual blocks to optimize feature transmission while maintaining model compactness. Additionally, we introduce two flexible, plug-and-play modules: the Context Feature Fusion (CFF) module, which effectively integrates multi-scale contextual information at skip connection layers, and the Multi-Scale Perception Aggregation (MPA) module, which focuses on extracting and fusing multi-scale features at bottleneck layers. Experimental results demonstrate that LMFNet performs exceptionally well on the IUT dataset, significantly outperforming existing methods in terms of segmentation accuracy and model lightweighting performance.

  • Research Article
  • 10.1145/3736768
Echo Depth Estimation via Attention-based Hierarchical Multi-scale Feature Fusion Network
  • Aug 12, 2025
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Wenjie Zhang + 6 more

In environments where vision-based depth estimation systems, such as those utilizing infrared or imaging technologies, encounter limitations—particularly in low-light conditions—alternative approaches become essential. Echo depth estimation emerges as a compelling solution by leveraging the time delay of echoes to map the geometric structure of the surrounding environment. This method offers distinct advantages in specific scenarios, providing reliable data for accurate scene understanding and 3D reconstruction. Traditional echo depth estimation techniques primarily depend on spatial information captured by the encoder and depth predictions made by the decoder. However, these methods often fail to fully exploit the rich depth features present at different simultaneous frequencies. To address this challenge, we propose an echo depth estimation method via Attention-based Hierarchical Multi-scale Feature Fusion Network (AHMF-Net). This network is designed to extract spatial depth information from echo spectrograms across multiple scales and hierarchical levels, while fusing the most relevant information using an attention mechanism. AHMF-Net introduces two key modules in hierarchical levels: the Intra-layer Multi-scale Attention Feature Fusion (IMAF) module, which functions as the encoder to capture multi-scale features across varying granularities, and the Inter-layer Multi-Scale Detail Feature Fusion (IMDF) module, which integrates features from all encoding layers into the decoder to enable effective inter-layer multi-scale fusion. Additionally, the encoder incorporates an attention mechanism that enhances depth-related features by capturing channel dependencies at multiple scales. We evaluated AHMF-Net on the Replica, Matterport3D, and BatVision datasets, where it consistently outperformed state-of-the-art models in echo-based depth estimation, demonstrating superior accuracy and robustness. The source code is publicly available at https://github.com/wjzhang-ai/AHMF-Net .

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3390/app12010107
Autonomous Multiple Tramp Materials Detection in Raw Coal Using Single-Shot Feature Fusion Detector
  • Dec 23, 2021
  • Applied Sciences
  • Dongjun Li + 3 more

In the coal mining process, various types of tramp materials will be mixed into the raw coal, which will affect the quality of the coal and endanger the normal operation of the equipment. Automatic detection of tramp materials objects is an important process and basis for efficient coal sorting. However, previous research has focused on the detection of gangue, ignoring the detection of other types of tramp materials, especially small targets. Because the initial Single Shot MultiBox Detector (SSD) lacks the efficient use of feature maps, it is difficult to obtain stable results when detecting tramp materials objects. In this article, an object detection algorithm based on feature fusion and dense convolutional network is proposed, which is called tramp materials in raw coal single-shot detector (TMRC-SSD), to detect five types of tramp materials such as gangue, bolt, stick, iron sheet, and iron chain. In this algorithm, a modified DenseNet is first designed and a four-stage feature extractor is used to down-sample the feature map stably. After that, we use the dilation convolution and multi-branch structure to enrich the receptive field. Finally, in the feature fusion module, we designed cross-layer feature fusion and attention fusion modules to realize the semantic interaction of feature maps. The experiments show that the module we designed is effective. This method is better than the existing model. When the input image is 300 × 300 pixels, it can reach 96.12% MAP and 24FPS. Especially in the detection of small objects, the detection accuracy has increased by 4.1 to 95.57%. The experimental results show that this method can be applied to the actual detection of tramp materials objects in raw coal.

  • Research Article
  • 10.1088/1742-6596/2999/1/012008
Improved YOLOv8-Based Obstacle Detection Algorithm for Waterborne Transportation
  • Apr 1, 2025
  • Journal of Physics: Conference Series
  • Haozhi Zhang + 5 more

Waterborne obstacle detection plays a critical role in the environmental perception of intelligent cargo ship transportation, directly influencing both the efficiency and safety of the transportation process. However, current approaches still encounter challenges regarding detection accuracy, real-time performance, and model lightweighting. So this paper proposes a novel lightweight waterborne obstacle detection algorithm based on an improved YOLOv8 framework. The algorithm integrates an advanced Multi-Branch Fusion (MBF) and Attentional Feature (AF) module with the Cross Stage Feature Fusion (C2f) module to form a new C2f-PPAC module, enhancing small-object feature representation while minimizing computational overhead. A feature fusion module, C2f-REPC, is introduced based on structural re-parameterization techniques, which reduces feature redundancy and improves detection accuracy. Additionally, a 160×160 detection head is added to enhance small object detection in challenging backgrounds. Experimental results on a ship-port logistics waterborne obstacle dataset show that the proposed algorithm outperforms the original YOLOv8 by achieving a 3.4% increase in mAP50 and a 3.6% increase in mAP50-95. The computational cost is reduced to 7.9G, and the model size is decreased by 2MB, allowing for real-time and precise detection of waterborne obstacles. The results highlight the effectiveness and advantages of the proposed approach.

  • Research Article
  • 10.1038/s41598-025-18049-z
Liver CT image segmentation network based on multi-scale feature fusion
  • Sep 26, 2025
  • Scientific Reports
  • Dong Zhu + 5 more

Accurate localization and segmentation of the liver based on CT images are crucial for the early diagnosis and staging of liver cancer. However, during CT scanning, the liver’s position may vary due to periodic respiratory motion, and its boundaries are often unclear due to its close proximity to surrounding organs, which increases the difficulty of segmentation. To address these challenges, this paper proposes an end-to-end liver image segmentation network–EDNe, based on a residual structure. The model introduces an automated feature fusion module (ECAdd) and a residual structure, enhancing the network’s ability to extract multi-scale features from liver CT images. Additionally, a Deep Feature Enhancement (DFE) attention module is incorporated during the decoding phase to improve the network’s ability to capture fine-grained details, thereby ensuring an effective improvement in segmentation accuracy. EDNet was validated on the LiTS2017 and 3D-IRCADb-01 datasets, achieving Dice scores of 0.9651 and 0.9683, and IoU scores of 0.9330 and 0.9385 on the LiTS2017 and 3D-IRCADb-01 datasets, respectively. Experimental results show that EDNet not only exhibits significant advantages in segmentation performance but also demonstrates high robustness across different datasets, providing a reliable and effective solution for liver CT image segmentation tasks.

  • Research Article
  • Cite Count Icon 2
  • 10.1007/s11548-024-03319-4
G-SET-DCL: a guided sequential episodic training with dual contrastive learning approach for colon segmentation.
  • Jan 9, 2025
  • International journal of computer assisted radiology and surgery
  • Samir Farag Harb + 4 more

This article introduces a novel deep learning approach to substantially improve the accuracy of colon segmentation even with limited data annotation, which enhances the overall effectiveness of the CT colonography pipeline in clinical settings. The proposed approach integrates 3D contextual information via guided sequential episodic training in which a query CT slice is segmented by exploiting its previous labeled CT slice (i.e., support). Segmentation starts by detecting the rectum using a Markov Random Field-based algorithm. Then, supervised sequential episodic training is applied to the remaining slices, while contrastive learning is employed to enhance feature discriminability, thereby improving segmentation accuracy. The proposed method, evaluated on 98 abdominal scans of prepped patients, achieved a Dice coefficient of 97.3% and a polyp information preservation accuracy of 98.28%. Statistical analysis, including 95% confidence intervals, underscores the method's robustness and reliability. Clinically, this high level of accuracy is vital for ensuring the preservation of critical polyp details, which are essential for accurate automatic diagnostic evaluation. The proposed method performs reliably in scenarios with limited annotated data. This is demonstrated by achieving a Dice coefficient of 97.15% when the model was trained on a smaller number of annotated CT scans (e.g., 10 scans) than the testing dataset (e.g., 88 scans). The proposed sequential segmentation approach achieves promising results in colon segmentation. A key strength of the method is its ability to generalize effectively, even with limited annotated datasets-a common challenge in medical imaging.

  • Research Article
  • 10.1186/s12880-025-01737-7
Multi-spatial-attention U-Net: a novel framework for automated gallbladder segmentation on CT images
  • May 30, 2025
  • BMC Medical Imaging
  • Henan Lou + 12 more

ObjectiveThis study aimed to construct a novel model, Multi-Spatial Attention U-Net (MSAU-Net) by incorporating our proposed Multi-Spatial Attention (MSA) block into the U-Net for the automated segmentation of the gallbladder on CT images.MethodsThe gallbladder dataset consists of CT images of retrospectively-collected 152 liver cancer patients and corresponding ground truth delineated by experienced physicians. Our proposed MSAU-Net model was transformed into two versions V1(with one Multi-Scale Feature Extraction and Fusion (MSFEF) module in each MSA block) and V2 (with two parallel MSEFE modules in each MSA blcok). The performances of V1 and V2 were evaluated and compared with four other derivatives of U-Net or state-of-the-art models quantitatively using seven commonly-used metrics, and qualitatively by comparison against experienced physicians’ assessment.ResultsMSAU-Net V1 and V2 models both outperformed the comparative models across most quantitative metrics with better segmentation accuracy and boundary delineation. The optimal number of MSA was three for V1 and two for V2. Qualitative evaluations confirmed that they produced results closer to physicians’ annotations. External validation revealed that MSAU-Net V2 exhibited better generalization capability.ConclusionThe MSAU-Net V1 and V2 both exhibited outstanding performance in gallbladder segmentation, demonstrating strong potential for clinical application. The MSA block enhances spatial information capture, improving the model’s ability to segment small and complex structures with greater precision. These advantages position the MSAU-Net V1 and V2 as valuable tools for broader clinical adoption.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.asoc.2024.112379
Feature fusion and context interaction for RGB-D indoor semantic segmentation
  • Oct 29, 2024
  • Applied Soft Computing
  • Heng Liu + 2 more

Feature fusion and context interaction for RGB-D indoor semantic segmentation

  • Research Article
  • Cite Count Icon 2
  • 10.1007/s12539-025-00748-w
An Adaptive Multi-Stage and Adjacent-Level Feature Integration Network for Brain Tumor Image Segmentation.
  • Aug 14, 2025
  • Interdisciplinary sciences, computational life sciences
  • Jiwen Zhou + 3 more

The segmentation of brain tumor magnetic resonance imaging (MRI) plays a crucial role in assisting diagnosis, treatment planning, and disease progression evaluation. Convolutional neural networks (CNNs) and transformer-based methods have achieved significant progress due to their local and global feature extraction capabilities. However, similar to other medical image segmentation tasks, challenges remain in addressing issues such as blurred boundaries, small lesion volumes, and interwoven regions. General CNN and transformer approaches struggle to effectively resolve these issues. Therefore, a new multi-stage and adjacent-level feature integration network (MAI-Net) is introduced to overcome these challenges, thereby improving the overall segmentation accuracy. MAI-Net consists of dual-branch, multi-level structures and three innovative modules. The stage-level multi-scale feature extraction (SMFE) module focuses on capturing feature details from fine to coarse scales, improving detection of blurred edges and small lesions. The adjacent-level feature fusion (AFF) module facilitates information exchange across different levels, enhancing segmentation accuracy in complex regions as well as small volume lesions. Finally, the multi-stage feature fusion (MFF) module further integrates features from various levels to improve segmentation performance in complex regions. Extensive experiments on BraTS2020 and BraTS2021 datasets demonstrate that MAI-Net significantly outperforms existing methods in Dice and HD95 metrics. Furthermore, generalization experiments on a public ischemic stroke dataset confirm its robustness across different segmentation tasks. These results highlight the significant advantages of MAI-Net in addressing domain-specific challenges while maintaining strong generalization capabilities.

  • Research Article
  • 10.1177/17298806241279609
A lightweight color and geometry feature extraction and fusion module for end-to-end 6D pose estimation
  • Sep 1, 2024
  • International Journal of Advanced Robotic Systems
  • Guoyu Zuo + 2 more

Although advancements in red–green–blue-depth (RGB-D)-based six degree-of-freedom (6D) pose estimation methods, severe occlusion remains challenging. Addressing this issue, we propose a novel feature fusion module that can efficiently leverage the color and geometry information in RGB-D images. Unlike prior fusion methods, our method employs a two-stage fusion process. Initially, we extract color features from RGB images and integrate them into a point cloud. Subsequently, an anisotropic separable set abstraction network-like network is utilized to process the fused point cloud, extracting both local and global features, which are then combined to generate the final fusion features. Furthermore, we introduce a lightweight color feature extraction network to reduce model complexity. Extensive experiments conducted on the LineMOD, Occlusion LineMOD, and YCB-Video datasets conclusively demonstrate that our method significantly enhances prediction accuracy, reduces training time, and exhibits robustness to occlusion. Further experiments show that our model is significantly smaller than the latest popular 6D pose estimation models, which indicates that our model is easier to deploy on mobile platforms.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.