SDMFFN: a novel specular detection median filtering fusion network for specular reflection removal in endoscopic images
Objective. Endoscopic imaging is vital in Minimally Invasive Surgery (MIS), but its utility is often compromised by specular reflections that obscure important details and hinder diagnostic accuracy. Existing methods to address these reflections face limitations, particularly those relying on color-based thresholding and the underutilization of deep learning for highlight detection.Approach. To tackle these challenges, we propose the Specular Detection Median Filtering Fusion Network (SDMFFN), a novel framework designed to detect and remove specular reflections in endoscopic images. The SDMFFN employs a two-stage process: detection and removal. In the detection phase, we utilize the enhanced Specular Transformer Unet (S-TransUnet) model integrating Atrous Spatial Pyramid Pooling (ASPP), Information Bottleneck (IB) and Convolutional Block Attention Module (CBAM) to optimize multi-scale feature extraction, which helps to achieve accurate highlight detection. In the removal phase, we improve the advanced median filtering to smooth reflective areas and integrate color information for a natural restoration.Main results. Experimental results show that our proposed SDMFFN has outperformed other methods. Our method improves visual clarity and diagnostic precision, ultimately enhancing surgical outcomes and reducing the risk of misdiagnosis by delivering high-quality, reflection-free endoscopic images.Significance. The robust performance of SDMFFN suggests its adaptability to other medical imaging modalities, paving the way for broader clinical and research applications in robotic surgery, diagnostic endoscopy and telemedicine. To promote further progress in the research, we will make the code publicly available at:https://github.com/jize123457/SDMFFN.
- Research Article
21
- 10.3390/s21237949
- Nov 28, 2021
- Sensors (Basel, Switzerland)
Attention mechanisms have demonstrated great potential in improving the performance of deep convolutional neural networks (CNNs). However, many existing methods dedicate to developing channel or spatial attention modules for CNNs with lots of parameters, and complex attention modules inevitably affect the performance of CNNs. During our experiments of embedding Convolutional Block Attention Module (CBAM) in light-weight model YOLOv5s, CBAM does influence the speed and increase model complexity while reduce the average precision, but Squeeze-and-Excitation (SE) has a positive impact in the model as part of CBAM. To replace the spatial attention module in CBAM and offer a suitable scheme of channel and spatial attention modules, this paper proposes one Spatio-temporal Sharpening Attention Mechanism (SSAM), which sequentially infers intermediate maps along channel attention module and Sharpening Spatial Attention (SSA) module. By introducing sharpening filter in spatial attention module, we propose SSA module with low complexity. We try to find a scheme to combine our SSA module with SE module or Efficient Channel Attention (ECA) module and show best improvement in models such as YOLOv5s and YOLOv3-tiny. Therefore, we perform various replacement experiments and offer one best scheme that is to embed channel attention modules in backbone and neck of the model and integrate SSAM into YOLO head. We verify the positive effect of our SSAM on two general object detection datasets VOC2012 and MS COCO2017. One for obtaining a suitable scheme and the other for proving the versatility of our method in complex scenes. Experimental results on the two datasets show obvious promotion in terms of average precision and detection performance, which demonstrates the usefulness of our SSAM in light-weight YOLO models. Furthermore, visualization results also show the advantage of enhancing positioning ability with our SSAM.
- Research Article
11
- 10.1016/j.oregeorev.2023.105861
- Dec 31, 2023
- Ore Geology Reviews
3D mineral prospectivity modeling in the Sanshandao goldfield, China using the convolutional neural network with attention mechanism
- Research Article
- 10.3390/app151910790
- Oct 7, 2025
- Applied Sciences
To address the challenges of poor model generalization and suboptimal recognition accuracy stemming from limited and imbalanced sample sizes in tomato leaf disease identification, this study proposes a novel recognition strategy. This approach synergistically combines an enhanced image augmentation method based on generative adversarial networks with a lightweight deep learning model. Initially, an Atrous Spatial Pyramid Pooling (ASPP) module is integrated into the CycleGAN framework. This integration enhances the generator’s capacity to model multi-scale pathological lesion features, thereby significantly improving the diversity and realism of synthesized images. Subsequently, the Convolutional Block Attention Module (CBAM), incorporating both channel and spatial attention mechanisms, is embedded into the MobileNetV4 architecture. This enhancement boosts the model’s ability to focus on critical disease regions. Experimental results demonstrate that the proposed ASPP-CycleGAN significantly outperforms the original CycleGAN across multiple disease image generation tasks. Furthermore, the developed CBAM-MobileNetV4 model achieves a remarkable average recognition accuracy exceeding 97% for common tomato diseases, including early blight, late blight, and mosaic disease, representing a 1.86% improvement over the baseline MobileNetV4. The findings indicate that the proposed method offers exceptional data augmentation capabilities and classification performance under small-sample learning conditions, providing an effective technical foundation for the intelligent identification and control of tomato leaf diseases.
- Research Article
- 10.1142/s0219519424400621
- Nov 1, 2024
- Journal of Mechanics in Medicine and Biology
Coronary artery disease is a prominent contributor to cardiovascular death. Automatic segmentation of Coronary Angiography (CAG) images is crucial for early diagnosis and treatment and holds significant importance in clinical diagnosis, surgical planning, and treatment evaluation. However, due to challenges such as poor image quality, complex backgrounds, and fine vessel structures, the task of automatic segmentation has always been difficult. This study aims to improve the accuracy and robustness of automatic CAG segmentation. Therefore, we propose a UNet model improved with the Convolutional Block Attention Module (CBAM), which enhances the model’s ability to capture important feature areas by combining the Channel Attention Module (CAM) and the Spatial Attention Module (SAM), making the identified vessels completer and more accurate. Specifically, the CBAM module is integrated into each convolutional layer to enhance the feature representation capabilities of the channel and space. We evaluate the CBAM-UNet based on X-ray Angiography Coronary Artery Disease (XCAD) datasets. The experimental results indicate that the improved model’s accuracy increased to 97%, and precision reached 89.95%. The results demonstrate that the improved model’s overall performance on various metrics surpasses traditional methods, with significant improvements in anti-background noise interference and handling complex vascular structures.
- Research Article
23
- 10.3390/rs15020379
- Jan 7, 2023
- Remote Sensing
Remote image semantic segmentation technology is one of the core research elements in the field of computer vision and has a wide range of applications in production life. Most remote image semantic segmentation methods are based on CNN. Recently, Transformer provided a view of long-distance dependencies in images. In this paper, we propose RCCT-ASPPNet, which includes the dual-encoder structure of Residual Multiscale Channel Cross-Fusion with Transformer (RCCT) and Atrous Spatial Pyramid Pooling (ASPP). RCCT uses Transformer to cross fuse global multiscale semantic information; the residual structure is then used to connect the inputs and outputs. ASPP based on CNN extracts contextual information of high-level semantics from different perspectives and uses Convolutional Block Attention Module (CBAM) to extract spatial and channel information, which will further improve the model segmentation ability. The experimental results show that the mIoU of our method is 94.14% and 61.30% on the datasets Farmland and AeroScapes, respectively, and that the mPA is 97.12% and 84.36%, respectively, both outperforming DeepLabV3+ and UCTransNet.
- Research Article
6
- 10.1117/1.jrs.16.022206
- Mar 29, 2022
- Journal of Applied Remote Sensing
Urban forestry is regarded as the green infrastructure of cities, the analysis of which is of great significance for research on statistical analysis, such as greening, carbon sink, etc. on the basis of remote sensing detection. However, current urban forestry detection algorithms are focused on typical machine learning, whose results are limited by the manually selected features. Therefore, an innovative detection algorithm for analyzing urban forestry is proposed. First, the simple linear iterative clustering algorithm is adopted to segment the dataset, and the edge information of the object is preserved as much as possible. Second, a new network structure is constructed on the basis of Mask-RCNN to refine the detection results: (1) dilated convolution is adopted to replace the normal convolution of C1 in backbone (ResNet101) to increase the size of receptive field; (2) convolutional block attention module is introduced in C2 to give more weight to the shallow features such as edges and colors; and (3) atrous spatial pyramid pooling (ASPP) is added to branch of C3, enhancing the ability of multiscale fusion. With the effect of ASPP on detection results at different positions of backbone compared, the results show that the more backward the ASPP is, the more obvious the trend of negative correlation between network convergence speed and detection accuracy (C3: 89.2%, C4: 84.3%, and C5: 81.6%) is. The accuracy of random forest and support vector machines is also compared, with the former of 69.4% and the latter of 71.2%. In addition, the method of transfer learning to unmanned aerial vehicle data is used in this research, with the accuracy of 81.6%. The result shows that the improved segmentation model features better ability of learning shallow features than the original, able to accurately extract urban forestry from remote sensing images, which provides the reference for the further research on urban forestry detection, mapping, as well as the multisource remote sensing data fusion.
- Conference Article
2
- 10.1109/ccis57298.2022.10016433
- Nov 26, 2022
Deeplabv3+ is a standard semantic segmentation model, which adds decoding structure to recover spatial information of the image and uses the Atrous Spatial Pyramid Pooling (ASPP) module to solve the multi-scale problem of the image. However, the Deeplabv3+ model has some drawbacks regarding restoring details. Therefore, we propose the CB_Deeplabv3+ model. In the encoding structure of the CB_Deeplabv3+ model, we use ASPP modules cascaded in parallel to extend the network structure and enable the model to capture richer context information by increasing the information interaction between channels. At the same time, CB_Deeplabv3+ introduced the Convolutional Block Attention Module(CBAM) to solve the long-distance dependence problem in the encoding-decoding structure. Experimental evaluation results on the Part_VOC dataset show that CB_Deeplabv3+ achieves excellent performance for semantic segmentation.
- Research Article
7
- 10.3390/rs15153711
- Jul 25, 2023
- Remote Sensing
This paper focuses on the problems of omission, misclassification, and inter-adhesion due to overly dense distribution, intraclass diversity, and interclass variability when extracting winter wheat (WW) from high-resolution images. This paper proposes a deep supervised network RAunet model with multi-scale features that incorporates a dual-attention mechanism with an improved U-Net backbone network. The model mainly consists of a pyramid input layer, a modified U-Net backbone network, and a side output layer. Firstly, the pyramid input layer is used to fuse the feature information of winter wheat at different scales by constructing multiple input paths. Secondly, the Atrous Spatial Pyramid Pooling (ASPP) residual module and the Convolutional Block Attention Module (CBAM) dual-attention mechanism are added to the U-Net model to form the backbone network of the model, which enhances the feature extraction ability of the model for winter wheat information. Finally, the side output layer consists of multiple classifiers to supervise the results of different scale outputs. Using the RAunet model to extract the spatial distribution information of WW from GF-2 imagery, the experimental results showed that the mIou of the recognition results reached 92.48%, an improvement of 2.66%, 4.15%, 1.42%, 2.35%, 3.76%, and 0.47% compared to FCN, U-Net, DeepLabv3, SegNet, ResUNet, and UNet++, respectively. The superiority of the RAunet model in high-resolution images for WW extraction was verified in effectively improving the accuracy of the spatial distribution information extraction of WW.
- Research Article
4
- 10.3390/s24123724
- Jun 7, 2024
- Sensors (Basel, Switzerland)
Oil spills are a major threat to marine and coastal environments. Their unique radar backscatter intensity can be captured by synthetic aperture radar (SAR), resulting in dark regions in the images. However, many marine phenomena can lead to erroneous detections of oil spills. In addition, SAR images of the ocean include multiple targets, such as sea surface, land, ships, and oil spills and their look-alikes. The training of a multi-category classifier will encounter significant challenges due to the inherent class imbalance. Addressing this issue requires extracting target features more effectively. In this study, a lightweight U-Net-based model, Full-Scale Aggregated MobileUNet (FA-MobileUNet), was proposed to improve the detection performance for oil spills using SAR images. First, a lightweight MobileNetv3 model was used as the backbone of the U-Net encoder for feature extraction. Next, atrous spatial pyramid pooling (ASPP) and a convolutional block attention module (CBAM) were used to improve the capacity of the network to extract multi-scale features and to increase the speed of module calculation. Finally, full-scale features from the encoder were aggregated to enhance the network's competence in extracting features. The proposed modified network enhanced the extraction and integration of features at different scales to improve the accuracy of detecting diverse marine targets. The experimental results showed that the mean intersection over union (mIoU) of the proposed model reached more than 80% for the detection of five types of marine targets including sea surface, land, ships, and oil spills and their look-alikes. In addition, the IoU of the proposed model reached 75.85 and 72.67% for oil spill and look-alike detection, which was 18.94% and 25.55% higher than that of the original U-Net model, respectively. Compared with other segmentation models, the proposed network can more accurately classify the black regions in SAR images into oil spills and their look-alikes. Furthermore, the detection performance and computational efficiency of the proposed model were also validated against other semantic segmentation models.
- Research Article
- 10.3390/agronomy14071565
- Jul 18, 2024
- Agronomy
Excessive fertilizer use has led to environmental pollution and reduced crop yields, underscoring the importance of research into variable-rate fertilization (VRF) based on digital image technology in precision agriculture. Current methods, which rely on spectral sensors for monitoring and prescription mapping, face significant technical challenges, high costs, and operational complexities, limiting their widespread adoption. This study presents an automated, intelligent, and precise approach to maize canopy image segmentation using the multi-scale attention and Unet model to enhance VRF decision making, reduce fertilization costs, and improve accuracy. A dataset of maize canopy images under various lighting and growth conditions was collected and subjected to data augmentation and normalization preprocessing. The MCAC-Unet model, built upon the MobilenetV3 backbone network and integrating the convolutional block attention module (CBAM), atrous spatial pyramid pooling (ASPP) multi-scale feature fusion, and content-aware reassembly of features (CARAFE) adaptive upsampling modules, achieved a mean intersection over union (mIOU) of 87.51% and a mean pixel accuracy (mPA) of 93.85% in maize canopy image segmentation. Coverage measurements at a height of 1.1 m indicated a relative error ranging from 3.12% to 6.82%, averaging 4.43%, with a determination coefficient of 0.911, meeting practical requirements. The proposed model and measurement system effectively address the challenges in maize canopy segmentation and coverage assessment, providing robust support for crop monitoring and VRF decision making in complex environments.
- Research Article
1
- 10.3390/diagnostics15070854
- Mar 27, 2025
- Diagnostics (Basel, Switzerland)
Background: Examining chest radiograph images (CXR) is an intricate and time-consuming process, sometimes requiring the identification of many anomalies at the same time. Lung segmentation is key to overcoming this challenge through different deep learning (DL) techniques. Many researchers are working to improve the performance and efficiency of lung segmentation models. This article presents a DL-based approach to accurately identify the lung mask region in CXR images to assist radiologists in recognizing early signs of high-risk lung diseases. Methods: This paper proposes a novel technique, Lightweight Residual U-Net, combining the strengths of the convolutional block attention module (CBAM), the Atrous Spatial Pyramid Pooling (ASPP) block, and the attention module, which consists of only 3.24 million trainable parameters. Furthermore, the proposed model has been trained using both the RELU and LeakyReLU activation functions, with LeakyReLU yielding superior performance. The study indicates that the Dice loss function is more effective in achieving better results. Results: The proposed model is evaluated on three benchmark datasets: JSRT, SZ, and MC, achieving a Dice score of 98.72%, 97.49%, and 99.08%, respectively, outperforming the state-of-the-art models. Conclusions: Using the capabilities of DL and cutting-edge attention processes, the proposed model improves current efforts to enhance lung segmentation for the early identification of many serious lung diseases.
- Research Article
- 10.3390/rs17213563
- Oct 28, 2025
- Remote Sensing
Wheat and corn are two major food crops in Xinjiang. However, the spectral similarity between these crop types and the complexity of their spatial distribution has posed significant challenges to accurate crop identification. To this end, the study aimed to improve the accuracy of crop distribution identification in complex environments in three ways. First, by analysing the kNDVI and EVI time series, the optimal identification window was determined to be days 156–176—a period when wheat is in the grain-filling to milk-ripening phase and maize is in the jointing to tillering phase—during which, the strongest spectral differences between the two crops occurs. Second, principal component analysis (PCA) was applied to Sentinel-2 data. The top three principal components were extracted to construct the input dataset, effectively integrating visible and near-infrared band information. This approach suppressed redundancy and noise while replacing traditional RGB datasets. Finally, the Convolutional Block Attention Module (CBAM) was integrated into the U-Net model to enhance feature focusing on key crop areas. An improved Atrous Spatial Pyramid Pooling (ASPP) module based on deep separable convolutions was adopted to reduce the computational load while boosting multi-scale context awareness. The experimental results showed the following: (1) Wheat and corn exhibit obvious phenological differences between the 156th and 176th days of the year, which can be used as the optimal time window for identifying their spatial distributions. (2) The method proposed by this research had the best performance, with its mIoU, mPA, F1-score, and overall accuracy (OA) reaching 83.03%, 91.34%, 90.73%, and 90.91%, respectively. Compared to DeeplabV3+, PSPnet, HRnet, Segformer, and U-Net, the OA improved by 5.97%, 4.55%, 2.03%, 8.99%, and 1.5%, respectively. The recognition accuracy of the PCA dataset improved by approximately 2% compared to the RGB dataset. (3) This strategy still had high accuracy when predicting wheat and corn yields in Qitai County, Xinjiang, and had a certain degree of generalisability. In summary, the improved strategy proposed in this study holds considerable application potential for identifying the spatial distribution of wheat and corn in arid regions.
- Research Article
122
- 10.1109/jstars.2022.3198517
- Jan 1, 2022
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Change detection methods play an indispensable role in remote sensing. Some change detection methods have obtained a fairly good performance by introducing attention mechanism on the basis of the convolutional neural network (CNN), but identifying intricate changes remains difficult. In response to these problems, this article proposes a new model for detecting changes in remote sensing, namely, MTCNet, which combines the advantages of multi-scale Transformer with the convolutional block attention module (CBAM) to improve the detection quality of different remote sensing images. On the basis of traditional convolutions, the Transformer module is introduced to extract bitemporal image features by modeling contextual information. Based on the Transformer module, a multi-scale module is designed to form a multi-scale Transformer, which can obtain features at different scales in bitemporal images, thereby identifying the changes we are interested in. Based on the multiscale Transformer module, the CBAM is introduced. The CBAM is split into a spatial attention module (SAM) and a channel attention module (CAM), which are applied to the front and back ends of the multi-scale Transformer respectively. Spatial information and channel information of feature maps are modeled separately. In this article, the validity and efficiency of the method are verified by a large number of experiments on the LEVIR-CD dataset and the WHU-CD dataset.
- Research Article
- 10.3390/aerospace12060542
- Jun 15, 2025
- Aerospace
A novel interactive maintenance method for space station in-orbit devices using scene semantic segmentation technology is proposed. First, a wearable and handheld system is designed to capture images from the astronaut in the space station’s front view scene and play these images on a handheld terminal in real-time. Second, the proposed system quantitatively evaluates the environmental lighting condition in the scene by calculating image quality evaluation parameters. If the lighting condition is not proper, a prompt message will be given to the astronaut to remind him or her to adjust the environment illumination. Third, our system adopts an improved DeepLabV3+ network for semantic segmentation of these astronauts’ forward view scene images. Regarding the improved network, the original backbone network is replaced with a lightweight convolutional neural network, i.e., the MobileNetV2, with a smaller model scale and computational complexity. The convolutional block attention module (CBAM) is introduced to improve the network’s feature perception ability. The atrous spatial pyramid pooling (ASPP) module is also considered to enable an accurate calculation of encoding multi-scale information. Extensive simulation experiment results indicate that the accuracy, precision, and average intersection over the union of the proposed algorithm can be better than 95.0%, 96.0%, and 89.0%, respectively. And the ground application experiments have also shown that our proposed technique can effectively shorten the working time of the system user.
- Research Article
4
- 10.3390/jimaging8100257
- Sep 21, 2022
- Journal of Imaging
Compared with traditional manual inspection, inspection robots can not only meet the all-weather, real-time, and accurate inspection needs of substation inspection, they also reduce the work intensity of operation and maintenance personnel and decrease the probability of safety accidents. For the urgent demand of substation inspection robot intelligence enhancement, an environment understanding algorithm is proposed in this paper, which is an improved DeepLab V3+ neural network. The improved neural network replaces the original dilate rate combination in the ASPP (atrous spatial pyramid pooling) module with a new dilate rate combination with better segmentation accuracy of object edges and adds a CBAM (convolutional block attention module) in the two up-samplings, respectively. In order to be transplanted to the embedded platform with limited computing resources, the improved neural network is compressed. Multiple sets of comparative experiments on the standard dataset PASCAL VOC 2012 and the substation dataset have been made. Experimental results show that, compared with the DeepLab V3+, the improved DeepLab V3+ has a mean intersection-over-union (mIoU) of eight categories of 57.65% on the substation dataset, with an improvement of 6.39%, and the model size of 13.9 M, with a decrease of 147.1 M.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.