Vision Mamba attention feature fusion UNet: an innovative state space model for accurate polyp segmentation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Vision Mamba attention feature fusion UNet: an innovative state space model for accurate polyp segmentation

Similar Papers
  • Research Article
  • Cite Count Icon 65
  • 10.1007/s00330-020-07608-9
Automated segmentation of kidney and renal mass and automated detection of renal mass in CT urography using 3D U-Net-based deep convolutional neural network.
  • Jan 13, 2021
  • European Radiology
  • Zhiyong Lin + 6 more

To develop a 3D U-Net-based deep learning model for automated segmentation of kidney and renal mass, and detection of renal mass in corticomedullary phase of computed tomography urography (CTU). Data on 882 kidneys obtained from CTU data of 441 patients with renal mass were used to learn and evaluate the deep learning model. The CTU data of 35 patients with small renal tumors (diameter ≤ 1.5 cm) were used for additional testing. The ground truth data for the kidney, renal tumor, and cyst were manually annotated on corticomedullary phase images of CTU. The proposed segmentation model for kidney and renal mass was constructed based on a 3D U-Net. The segmentation accuracy was evaluated through the Dice similarity coefficient (DSC). The volume of the maximum 3D volume of interest of renal tumor and cyst in the predicted segmentation by the model was used as an identification indicator, while the detection performance of the model was evaluated by the area under the receiver operation characteristic curve. The proposed model showed a high accuracy in segmentation of kidney and renal tumor, with average DSC of 0.973 and 0.844, respectively. It performed moderately in the renal cyst segmentation, with an average DSC of 0.536 in the test set. Also, this model showed good performance in detecting renal tumor and cyst. The proposed automated segmentation and detection model based on 3D U-Net shows promising results for the segmentation of kidney and renal tumor, and the detection of renal tumor and cyst. • The segmentation model based on 3D U-Net showed high accuracy in segmentation of kidney and renal neoplasm, and good detection performance of renal neoplasm and cyst in corticomedullary phase of CTU. • The segmentation model based on 3D U-Net is a fully automated aided diagnostic tool that could be used to reduce the workload of radiologists and improve the accuracy of diagnosis. • The segmentation model based on 3D U-Net would be helpful to provide quantitative information for diagnosis, treatment, surgical planning, etc.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1007/s12021-024-09704-3
Performance of Convolutional Neural Network Models in Meningioma Segmentation in Magnetic Resonance Imaging: A Systematic Review and Meta-Analysis
  • Dec 28, 2024
  • Neuroinformatics
  • Ting-Wei Wang + 9 more

BackgroundMeningioma, the most common primary brain tumor, presents significant challenges in MRI-based diagnosis and treatment planning due to its diverse manifestations. Convolutional Neural Networks (CNNs) have shown promise in improving the accuracy and efficiency of meningioma segmentation from MRI scans. This systematic review and meta-analysis assess the effectiveness of CNN models in segmenting meningioma using MRI.MethodsFollowing the PRISMA guidelines, we searched PubMed, Embase, and Web of Science from their inception to December 20, 2023, to identify studies that used CNN models for meningioma segmentation in MRI. Methodological quality of the included studies was assessed using the CLAIM and QUADAS-2 tools. The primary variable was segmentation accuracy, which was evaluated using the Sørensen–Dice coefficient. Meta-analysis, subgroup analysis, and meta-regression were performed to investigate the effects of MRI sequence, CNN architecture, and training dataset size on model performance.ResultsNine studies, comprising 4,828 patients, were included in the analysis. The pooled Dice score across all studies was 89% (95% CI: 87–90%). Internal validation studies yielded a pooled Dice score of 88% (95% CI: 85–91%), while external validation studies reported a pooled Dice score of 89% (95% CI: 88–90%). Models trained on multiple MRI sequences consistently outperformed those trained on single sequences. Meta-regression indicated that training dataset size did not significantly influence segmentation accuracy.ConclusionCNN models are highly effective for meningioma segmentation in MRI, particularly during the use of diverse datasets from multiple MRI sequences. This finding highlights the importance of data quality and imaging sequence selection in the development of CNN models. Standardization of MRI data acquisition and preprocessing may improve the performance of CNN models, thereby facilitating their clinical adoption for the optimal diagnosis and treatment of meningioma.

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s40537-025-01199-2
A fast and accurate segmentation model of lychee tree canopy from UAV remote sensing image based on big data and deep learning
  • Jun 3, 2025
  • Journal of Big Data
  • Jianhua Wang + 1 more

Litchi chinensis Sonn, an evergreen tree within the Sapindaceae family and the Litchi Sonn genus, plays a significant role in agricultural management. The canopy information of lychee trees offers a comprehensive and detailed perspective for agricultural experts and orchard managers to monitor and assess the health and growth status of lychee trees in orchards. However, existing segmentation models for lychee tree canopies derived from UAV remote sensing images suffer from low segmentation accuracy and long construction times, making it challenging to meet the demands for rapid feedback and precise decision-making in lychee orchard management. To address this issue, this paper proposes a fast and accurate segmentation model for lychee tree canopies from UAV remote sensing images based on big data and deep learning technologies. Specifically, in our proposed method, first, the Res34CA_UNet segmentation model architecture was designed to improve the segmentation accuracy rate, where residual network technology was adopted to improve the ability to extract input data features, while coordinate attention mechanisms was used to enhance the perception ability of structural diversity for our model; Second, a parallel Res34CA_UNet segmentation model based on Hadoop and Spark was developed to accelerate the construction process based on the improved Res34CA_UNet segmentation model for lychee tree canopy segmentation respectively. Experimental results demonstrate that our proposed segmentation model, which integrates improved residual networks and attention mechanisms, achieves a 3.88% increase in segmentation accuracy. Furthermore, our proposed Hadoop and Spark-based parallel Res34CA_UNet segmentation models reduce 86.9% and 88.2% construction times respectively. These improvements significantly enhance the overall segmentation efficiency for lychee tree canopies from UAV remote sensing images, providing a robust solution for fast and accurate orchard management.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/plants13223186
Accurately Segmenting/Mapping Tobacco Seedlings Using UAV RGB Images Collected from Different Geomorphic Zones and Different Semantic Segmentation Models.
  • Nov 13, 2024
  • Plants (Basel, Switzerland)
  • Qianxia Li + 6 more

The tobacco seedling stage is a crucial period for tobacco cultivation. Accurately extracting tobacco seedlings from satellite images can effectively assist farmers in replanting, precise fertilization, and subsequent yield estimation. However, in complex Karst mountainous areas, it is extremely challenging to accurately segment tobacco plants due to a variety of factors, such as the topography, the planting environment, and difficulties in obtaining high-resolution image data. Therefore, this study explores an accurate segmentation model for detecting tobacco seedlings from UAV RGB images across various geomorphic partitions, including dam and hilly areas. It explores a family of tobacco plant seedling segmentation networks, namely, U-Net, U-Net++, Linknet, PSPNet, MAnet, FPN, PAN, and DeepLabV3+, using the Hill Seedling Tobacco Dataset (HSTD), the Dam Area Seedling Tobacco Dataset (DASTD), and the Hilly Dam Area Seedling Tobacco Dataset (H-DASTD) for model training. To validate the performance of the semantic segmentation models for crop segmentation in the complex cropping environments of Karst mountainous areas, this study compares and analyzes the predicted results with the manually labeled true values. The results show that: (1) the accuracy of the models in segmenting tobacco seedling plants in the dam area is much higher than that in the hilly area, with the mean values of mIoU, PA, Precision, Recall, and the Kappa Coefficient reaching 87%, 97%, 91%, 85%, and 0.81 in the dam area and 81%, 97%, 72%, 73%, and 0.73 in the hilly area, respectively; (2) The segmentation accuracies of the models differ significantly across different geomorphological zones; the U-Net segmentation results are optimal for the dam area, with higher values of mIoU (93.83%), PA (98.83%), Precision (93.27%), Recall (96.24%), and the Kappa Coefficient (0.9440) than those of the other models; in the hilly area, the U-Net++ segmentation performance is better than that of the other models, with mIoU and PA of 84.17% and 98.56%, respectively; (3) The diversity of tobacco seedling samples affects the model segmentation accuracy, as shown by the Kappa Coefficient, with H-DASTD (0.901) > DASTD (0.885) > HSTD (0.726); (4) With regard to the factors affecting missed segregation, although the factors affecting the dam area and the hilly area are different, the main factors are small tobacco plants (STPs) and weeds for both areas. This study shows that the accurate segmentation of tobacco plant seedlings in dam and hilly areas based on UAV RGB images and semantic segmentation models can be achieved, thereby providing new ideas and technical support for accurate crop segmentation in Karst mountainous areas.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.cmpb.2022.107049
DE-MRI myocardial fibrosis segmentation and classification model based on multi-scale self-supervision and transformer
  • Jul 29, 2022
  • Computer Methods and Programs in Biomedicine
  • Yuhan Ding + 3 more

DE-MRI myocardial fibrosis segmentation and classification model based on multi-scale self-supervision and transformer

  • Research Article
  • Cite Count Icon 1
  • 10.29063/ajrh2025/v29i4.10
Detection of precancerous lesions in cervical images of perimenopausal women using U-net deep learning.
  • Apr 30, 2025
  • African journal of reproductive health
  • Na Zhao + 5 more

Due to physiological changes during the perimenopausal period, the morphology of cervical cells undergoes certain alterations. Accurate cell image segmentation and lesion identification are of great significance for the early detection of precancerous lesions. Traditional detection methods may have certain limitations, thereby creating an urgent need for the development of more effective models. This study aimed to develop a highly efficient and accurate cervical cell image segmentation and recognition model to enhance the detection of precancerous lesions in perimenopausal women. based on U-shaped Network(U-Net) and Residual Network (ResNet). The model integrates U-Net with Segmentation Network (SegNet) and incorporates the Squeeze-and-Excitation (SE) attention mechanism to create the 2Se/U-Net segmentation model. Additionally, ResNet is optimized with the local discriminant loss function (LD-loss) and deep residual learning (DRL) blocks to develop the LD/ResNet lesion recognition model. The performance of the models is evaluated using data from 103 cytology images of perimenopausal women, focusing on segmentation metrics like mean pixel accuracy (MPA) and mean intersection over union (mIoU), as well as lesion detection metrics such as accuracy (Acc), precision (Pre), recall (Re), and F1-score (F1). Results show that the 2Se/U-Net model achieves an MPA of 92.63% and mIoU of 96.93%, outperforming U-Net by 12.48% and 9.47%, respectively. The LD/ResNet model demonstrates over 97.09% accuracy in recognizing cervical cells and achieves high detection performance for precancerous lesions, with Acc, Pre, and Re at 98.95%, 99.36%, and 98.89%, respectively. The model shows great potential for enhancing cervical cancer screening in clinical settings.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.cmpb.2021.106325
In-depth learning of automatic segmentation of shoulder joint magnetic resonance images based on convolutional neural networks
  • Jul 31, 2021
  • Computer Methods and Programs in Biomedicine
  • Xinhong Mu + 8 more

In-depth learning of automatic segmentation of shoulder joint magnetic resonance images based on convolutional neural networks

  • Research Article
  • 10.58414/scientifictemper.2025.16.5.11
Hybrid pigeon optimization-based feature selection and modified multi-class semantic segmentation for skin cancer detection (HPO-MMSS)
  • May 31, 2025
  • The Scientific Temper
  • J Fathima Fouzia + 2 more

Skin Cancer is among the most serious medical conditions in the worldwide, and curative results are better when detected early. Some of the challenges with conventional skin cancer detection methods, such as Genetic Algorithm, Particle Swarm Optimization, and U-Net-based segmentation models, include choosing the right characteristics and accurately detecting the skin lesions. This paper proposes a unique method that combines a Hybrid Pigeon Optimization Algorithm (HPOA) for feature selection with a Modified Multi-Class Semantic Segmentation (MMSS) model for lesion segmentation. The precise feature selection of the proposed HPOA improves the performance of the segmentation and classification models. Based on an enhanced U-Net architecture, the MMSS model uses skip connections and boundary detection techniques to improve segmentation accuracy. This approach integrates information from multiple feature spaces to produce a more informative structure for segmenting the skin lesions. The ISIC 2020 dataset count is considered as 2000, 4000, 6000, 8000, and 10000 for testing the proposed methodology. The experimental results with segmentation accuracy (ranging from 91% to 96%), precision (ranging from 89% to 94%), and recall (ranging from 88% to 94%) show that the proposed methodology gives a strong foundation for detecting skin cancer automatically.

  • Research Article
  • 10.54254/2755-2721/2025.ld27402
Improving the Robustness of Surgical Instrument Segmentation Models via GAN-Based Data Augmentation
  • Oct 2, 2025
  • Applied and Computational Engineering
  • Jiaheng Li

In robotics-assisted surgery, accurate segmentation of surgical instruments from background pixels is crucial for the success of various downstream tasks. Currently, popular segmentation models such as the U-Net exhibit excellent performance under major surgical conditions. However, when benchmarking the segmentation models under unforeseen surgical complications such as overbleeding, concerns toward their robustness remain. In fact, due to the lack of training datasets for the surgical segmentation models under complications, these models often exhibit degraded performances and thus result in failures for conducting accurate segmentations at the pixel level. In this paper, we leverage the Generative Adversarial Network (GAN), specifically through generating realistic images that simulate the condition of overbleeding to enlarge the training datasets a process known as data augmentation. With more diverse training data from different scenarios, segmentation models can be trained better and reduce the risk of errors. In addition, a dataset consisting of 17 mock endoscopic video sequences with binary masks indicating ground truths was used for training/testing. With the dataset, a U-Net architecture is used to examine performance through metrics such as the Normalized Surface Distance (NSD) and Dice Similarity Coefficient (DSC). As a result, integrations with GAN-based data augmentation allow segmentation models performances to improve 9.07% for NSD and 6.63% for DSC. From the results, our GAN-based augmentation is proven to be effective, providing a novel direction for improving models robustness during the training procedure, paving the way for future exploration and clinical translation of the cutting-edge deep learning models into real robotics surgery.

  • Research Article
  • Cite Count Icon 16
  • 10.1016/j.compmedimag.2022.102072
Polypsegmentation network with hybrid channel-spatial attention and pyramid global context guided feature fusion.
  • Jun 1, 2022
  • Computerized Medical Imaging and Graphics
  • Xiaodong Huang + 6 more

Polypsegmentation network with hybrid channel-spatial attention and pyramid global context guided feature fusion.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/icbaie52039.2021.9390034
A New Method with SEU-Net model for Automatic Segmentation of Retinal Layers In Optical Coherence Tomography Images
  • Mar 26, 2021
  • Wang Xuehua + 3 more

Optical coherence tomography (OCT) images can reveal the ocular pathology of related diseases by analyzing the layer structure of the retina. Segmenting the retina is a standard procedure for ophthalmologists in diagnosing various diseases. Due to the large number of OCT images generated per patient, manual image analysis can be time-consuming and error-prone, thus compromising the efficiency of the diagnosis. Therefore, we propose an accurate retinal segmentation method combining SEU-Net (Squeeze and Excitation U-Net) model and graph search method. This method inserts SE blocks on the coding part of U-Net. The SE block improves the sensitivity of the network to the boundary information features of the retinal layer by recalculating features. This makes the less distinctive layers to be accurately segmented. By comparing the experimental results manually annotated by experts, the SEU-NET model combined with graph search method proposed in this paper can accurately segment the 9 retinal layer boundaries.

  • Research Article
  • Cite Count Icon 9
  • 10.1109/tase.2024.3350088
Accurate and Automatic Dental Crown Components Segmentation With Multi-Scale Attention Based U-Net and Hybrid Level Set Models
  • Jan 1, 2025
  • IEEE Transactions on Automation Science and Engineering
  • Dongyue Li + 5 more

This paper presents a two-step method to automatically and accurately segment the dental crown components from CT images. Firstly, a multi-scale attention based U-Net model is proposed for pulp segmentation, which is embedded with global and local attention modules. The constructed attention modules can automatically aggregate pixel-wise contextual information and focus on catching the real dental pulp region. Secondly, two efficient level set models are proposed: one is the shape constraint-based level set model for enamel and dentin segmentation, the other is the region mutual exclusion-based level set model for neighboring teeth segmentation. The proposed shape constraint term can better handle topology changes of teeth and the region mutual exclusion term can more effectively avoid intersecting segmentation. Besides, a starting slice initialization method is introduced to achieve automatic segmentation, and an accurate contour propagation strategy is developed for slice-by-slice segmentation. We set up a series of comparative experiments for evaluation. Experimental results verify that the proposed method obtains promising performance for each crown component segmentation, and outperforms state-of-the-art tooth segmentation methods in terms of accuracy. This suggests that the proposed method can be used to accurately segment the crown components for precise tooth preparation treatment. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —The motivation of this work is to reduce the burden on dentists during tooth preparation treatment, which requires accurate segmentation of crown components (i.e., enamel, dentin, and pulp) from dental CT images. Existing methods only focused on the segmentation of teeth or alveolar bone. Therefore, we present a novel automatic segmentation model for the dental crown components with high accuracy. A key strength of this study is the combination of a data-driven method (deep learning) and model-driven methods (level-set), which can provide good accuracy under limited training samples. This ability is highly desirable for practitioners by saving labor-intensive, costly labeling efforts. Furthermore, our proposed method will provide tools to help reduce subjectivity and human errors, as well as streamline and expedite the clinical workflow. This will significantly facilitate tooth preparation automation.

  • Research Article
  • Cite Count Icon 1
  • 10.3389/fmed.2025.1661680
VM-CAGSeg: a vessel structure-aware state space model for coronary artery segmentation in angiography images.
  • Oct 10, 2025
  • Frontiers in medicine
  • Yuanqing He + 4 more

Coronary artery segmentation in X-ray angiography is clinically critical for percutaneous coronary intervention (PCI), as it offers essential morphological guidance for stent deployment, stenosis assessment, and hemodynamic optimization. Nevertheless, inherent angiographic limitations, including complex vasculature, low contrast, and fuzzy boundaries, persist as significant challenges. Current methodologies exhibit notable shortcomings, including fragmented output continuity, noise susceptibility, and computational inefficiency. This study proposes VM-CAGSeg, a novel U-shaped architecture integrating vessel structure-aware state space modeling, to address these limitations. The framework introduces three key innovations: (1) A Vessel Structure-Aware State Space (VSASS) block that synergizes geometric priors from a Multiscale Vessel Structure-Aware (MVSA) module with long-range contextual modeling via Kolmogorov-Arnold State Space (KASS) blocks. The MVSA module enhances tubular feature representation through Hessian eigenvalue-derived vesselness measures. (2) A Cross-Stage Feature Interaction Fusion (CSFIF) module that replaces conventional skip connections with cross-stage feature fusion strategies to enhance the variability of learned features, preserving long-range dependencies and fine-grained details. (3) A unified architecture that integrates the Vessel Structure-Aware State Space (VSASS) block and the Cross-Stage Feature Interaction Fusion (CSFIF) module to achieve comprehensive vessel segmentation by synergizing multiscale geometric awareness, long-range dependency modeling, and cross-stage feature refinement. Experiments demonstrate that VM-CAGSeg achieves state-of-the-art performance, surpassing CNN-based (e.g., UNet++), transformer-based (e.g., MISSFormer), and state space model (SSM)-based (e.g., H_vmunet) methods, with a Dice similarity coefficient (DSC) of 88.15%, mIoU of 79.19%, and a 95% Hausdorff distance (HD95) of 13.68 mm. The framework significantly improved boundary delineation, reducing HD95 by 49.8% compared to UNet++ (27.15 mm) and by 16.6% compared to TransUNet (15.85 mm). While its sensitivity (90.05%) was marginally lower than that of TransUNet (90.33%), the model's balanced performance in segmentation accuracy and edge precision confirmed its robustness. These findings validate the effectiveness of integrating multiscale vessel-aware modeling, long-range dependency learning, and cross-stage feature fusion, making VM-CAGSeg a reliable solution for clinical vascular segmentation tasks that require fine-grained detail preservation. The proposed method is available as an open-source project at https://github.com/GIT-HYQ/VM-CAGSeg.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.3390/agriculture13091662
Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation
  • Aug 23, 2023
  • Agriculture
  • Dong Wang + 5 more

The phenotypic characteristics of soybean leaves are of great significance for studying the growth status, physiological traits, and response to the environment of soybeans. The segmentation model for soybean leaves plays a crucial role in morphological analysis. However, current baseline segmentation models are unable to accurately segment leaves in soybean leaf images due to issues like leaf overlap. In this paper, we propose a target leaf segmentation model based on leaf localization and guided segmentation. The segmentation model adopts a two-stage segmentation framework. The first stage involves leaf detection and target leaf localization. Based on the idea that a target leaf is close to the center of the image and has a relatively large area, we propose a target leaf localization algorithm. We also design an experimental scheme to provide optimal localization parameters to ensure precise target leaf localization. The second stage utilizes the target leaf localization information obtained from the first stage to guide the segmentation of the target leaf. To reduce the dependency of the segmentation results on the localization information, we propose a solution called guidance offset strategy to improve segmentation accuracy. We design multiple guided model experiments and select the one with the highest segmentation accuracy. Experimental results demonstrate that the proposed model exhibits strong segmentation capabilities, with the highest average precision (AP) and average recall (AR) reaching 0.976 and 0.981, respectively. We also compare our segmentation results with current baseline segmentation models, and multiple quantitative indicators and qualitative analysis indicate that our segmentation results are better.

  • Research Article
  • Cite Count Icon 11
  • 10.1108/aci-11-2020-0123
Precise and parallel segmentation model (PPSM) via MCET using hybrid distributions
  • Dec 15, 2020
  • Applied Computing and Informatics
  • Soha Rawas + 1 more

PurposeImage segmentation is one of the most essential tasks in image processing applications. It is a valuable tool in many oriented applications such as health-care systems, pattern recognition, traffic control, surveillance systems, etc. However, an accurate segmentation is a critical task since finding a correct model that fits a different type of image processing application is a persistent problem. This paper develops a novel segmentation model that aims to be a unified model using any kind of image processing application. The proposed precise and parallel segmentation model (PPSM) combines the three benchmark distribution thresholding techniques to estimate an optimum threshold value that leads to optimum extraction of the segmented region: Gaussian, lognormal and gamma distributions. Moreover, a parallel boosting algorithm is proposed to improve the performance of the developed segmentation algorithm and minimize its computational cost. To evaluate the effectiveness of the proposed PPSM, different benchmark data sets for image segmentation are used such as Planet Hunters 2 (PH2), the International Skin Imaging Collaboration (ISIC), Microsoft Research in Cambridge (MSRC), the Berkley Segmentation Benchmark Data set (BSDS) and Common Objects in COntext (COCO). The obtained results indicate the efficacy of the proposed model in achieving high accuracy with significant processing time reduction compared to other segmentation models and using different types and fields of benchmarking data sets.Design/methodology/approachThe proposed PPSM combines the three benchmark distribution thresholding techniques to estimate an optimum threshold value that leads to optimum extraction of the segmented region: Gaussian, lognormal and gamma distributions.FindingsOn the basis of the achieved results, it can be observed that the proposed PPSM–minimum cross-entropy thresholding (PPSM–MCET)-based segmentation model is a robust, accurate and highly consistent method with high-performance ability.Originality/valueA novel hybrid segmentation model is constructed exploiting a combination of Gaussian, gamma and lognormal distributions using MCET. Moreover, and to provide an accurate and high-performance thresholding with minimum computational cost, the proposed PPSM uses a parallel processing method to minimize the computational effort in MCET computing. The proposed model might be used as a valuable tool in many oriented applications such as health-care systems, pattern recognition, traffic control, surveillance systems, etc.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant