Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation.
Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation.
- Dissertation
- 10.37099/mtu.dc.etdr/1941
- Jan 1, 2025
Medical Image Segmentation is a critical task in the field of medical imaging, playing a crucial role in diagnostics, treatment planning, and disease monitoring. The emergence of Deep Learning (DL) has ushered in a new era in Artificial Intelligence (AI), propelling remarkable advancements in key domains like language translation, object recognition, and recommendation systems. This evolution has been accompanied by continuous enhancements in computational efficiency and improvements in predictive accuracy. The introduction of sophisticated algorithms, such as convolutional neural networks (CNNs) and transformers, exemplifies these advancements. DL algorithms have demonstrated exceptional efficacy in medical image segmentation tasks, showcasing the potential for AI-driven early diagnostics. However, the deployment of AI systems in clinical environments is often hindered by the substantial computational demands and complexity of cutting-edge DL models. In this research proposal, we explore various methodologies to enrich the visual feature representation for medical images. We focus on integrating global context-oriented techniques, such as attention mechanisms, into the development of parameter-efficient deep learning models. Our goal is to create a generalized, end-to-end medical image segmentation framework that can accurately and efficiently segment medical images across different modalities and conditions. By leveraging advanced deep learning techniques and optimizing model architectures, we aim to enhance the performance and generalization capabilities of medical image segmentation models, ultimately contributing to improved clinical outcomes
- Research Article
2
- 10.35629/5252-0612125135
- Dec 1, 2024
- International Journal of Advances in Engineering and Management
The rapid advancements in medical imaging technologies have significantly enhanced diagnostic accuracy and clinical decision-making in modern healthcare. Image segmentation and deep learning have emerged as transformative tools among these advancements. This article explores the pivotal role of image segmentation and deep learning in medical imaging, detailing their methodologies, applications, challenges, and future directions. Deep learning, particularly Convolutional Neural Networks (CNNs), has revolutionized medical imaging by automating the analysis of complex datasets and improving diagnostic precision. Image segmentation, a fundamental component of medical imaging, allows for delineating specific structures such as organs, tissues, and pathological regions. Together, these technologies have been applied in diverse fields, including oncology, cardiology, neurology, and ophthalmology, enabling applications such as tumor detection, organ segmentation, disease progression monitoring, and treatment planning. However, despite its transformative potential, the integration of deep learning into medical imaging faces several challenges. These include data scarcity, privacy concerns, interpretability issues, and regulatory hurdles. The article discusses various strategies to address these challenges, such as data augmentation, transfer learning, and the development of explainable AI models to ensure transparency and trustworthiness. Evaluation metrics, such as accuracy, sensitivity, specificity, and Dice Similarity Coefficient (DSC), are essential for assessing model performance. Rigorous clinical validation and regulatory approval are crucial to integrating deep learning systems into clinical workflows effectively. Looking ahead, the future of deep learning in medical imaging holds immense promise. Innovations like multimodal imaging, personalized medicine, and AI-driven automation are set to further revolutionize the field, enhancing the efficiency and accuracy of diagnostics. Collaborative efforts between clinicians, researchers, and AI developers will play a vital role in overcoming current limitations and driving progress. This article concludes by emphasizing the transformative potential of deep learning and image segmentation in medical imaging, highlighting their ability to improve diagnostic accuracy, streamline clinical workflows, and ultimately, enhance patient care. By addressing current challenges and continuing to innovate, these technologies are poised to redefine the landscape of medical diagnostics and treatment in the years to come.
- Research Article
7
- 10.1109/access.2019.2950960
- Jan 1, 2019
- IEEE Access
Deep learning has achieved great success in the field of computer vision, and the precision in image classification and image detection has surpassed humans. Therefore, this paper combines deep learning and medical image segmentation, focusing on how to improve the accuracy and speed of segmentation algorithm of medical exercise rehabilitation image. Aiming at the shortcomings of traditional medical image recognition methods, a medical exercise rehabilitation image segmentation algorithm based on hierarchical features of convolutional neural networks is proposed, this paper calls it as hierarchical features of convolutional neural networks (HFCNN). The algorithm firstly samples the convolution output of multiple layers in the convolutional neural network to a unified scale and combines them to construct a hierarchical feature. This hierarchical feature combines the structural information of objects contained in the shallow layer of the network with the semantic information of objects contained in the deep layers of the network, so it has a strong ability to express. Secondly, the image can be segmented into multiple super pixels by the super pixel segmentation algorithm. The classifier is trained using the hierarchical features of the super pixel, and then the classification result is mapped back to the pixel. Finally, a fully connected conditional random field algorithm including one-potential potential energy and paired potential energy is constructed. The corresponding energy function is used to smooth the classification result of the pixel, and the regional consistency and continuity of the pixel mark are improved. Compared with many classical convolutional neural network algorithms, this algorithm not only accelerates the network convergence speed, shortens the training time, but also significantly improves the accuracy of segmentation algorithm of medical exercise rehabilitation image, showing good practical value.
- Research Article
2
- 10.53941/sai.2025.100002
- Mar 11, 2025
- Sensors and AI
Medical image segmentation is a fundamental task in the field of medical imaging, enabling the accurate identification and delineation of structures such as organs, tissues, and lesions within medical images. These segmented regions are essential for diagnostic purposes, treatment planning, and disease monitoring. Over the years, medical image segmentation has evolved significantly, driven by advances in imaging technologies and computational techniques. Traditional methods, such as thresholding, region-growing, and active contours, have been supplemented and, in some cases, replaced by more sophisticated machine learning (ML) and deep learning (DL) approaches. Convolutional neural networks (CNNs) and their variants, including U-Net and Transformer-based models, have shown remarkable success in automating and improving segmentation tasks. This survey paper provides a comprehensive review of the various segmentation techniques, categorizing them into classical and deep learning-based methods. It discusses the strengths, limitations, and challenges of each approach, including issues related to data quality, class imbalance, and the generalizability of models. Furthermore, the paper highlights recent advancements in the field, emerging trends, and future directions for further enhancing segmentation accuracy, robustness, and efficiency in clinical applications. This work aims to serve as a valuable resource for researchers and clinicians looking to understand the current state of medical image segmentation and its potential future developments.
- Research Article
121
- 10.7717/peerj-cs.607
- Jun 29, 2021
- PeerJ. Computer science
Medical imaging refers to visualization techniques to provide valuable information about the internal structures of the human body for clinical applications, diagnosis, treatment, and scientific research. Segmentation is one of the primary methods for analyzing and processing medical images, which helps doctors diagnose accurately by providing detailed information on the body’s required part. However, segmenting medical images faces several challenges, such as requiring trained medical experts and being time-consuming and error-prone. Thus, it appears necessary for an automatic medical image segmentation system. Deep learning algorithms have recently shown outstanding performance for segmentation tasks, especially semantic segmentation networks that provide pixel-level image understanding. By introducing the first fully convolutional network (FCN) for semantic image segmentation, several segmentation networks have been proposed on its basis. One of the state-of-the-art convolutional networks in the medical image field is U-Net. This paper presents a novel end-to-end semantic segmentation model, named Ens4B-UNet, for medical images that ensembles four U-Net architectures with pre-trained backbone networks. Ens4B-UNet utilizes U-Net’s success with several significant improvements by adapting powerful and robust convolutional neural networks (CNNs) as backbones for U-Nets encoders and using the nearest-neighbor up-sampling in the decoders. Ens4B-UNet is designed based on the weighted average ensemble of four encoder-decoder segmentation models. The backbone networks of all ensembled models are pre-trained on the ImageNet dataset to exploit the benefit of transfer learning. For improving our models, we apply several techniques for training and predicting, including stochastic weight averaging (SWA), data augmentation, test-time augmentation (TTA), and different types of optimal thresholds. We evaluate and test our models on the 2019 Pneumothorax Challenge dataset, which contains 12,047 training images with 12,954 masks and 3,205 test images. Our proposed segmentation network achieves a 0.8608 mean Dice similarity coefficient (DSC) on the test set, which is among the top one-percent systems in the Kaggle competition.
- Research Article
22
- 10.3389/fmed.2023.1273441
- Sep 28, 2023
- Frontiers in Medicine
Medical images are information carriers that visually reflect and record the anatomical structure of the human body, and play an important role in clinical diagnosis, teaching and research, etc. Modern medicine has become increasingly inseparable from the intelligent processing of medical images. In recent years, there have been more and more attempts to apply deep learning theory to medical image segmentation tasks, and it is imperative to explore a simple and efficient deep learning algorithm for medical image segmentation. In this paper, we investigate the segmentation of lung nodule images. We address the above-mentioned problems of medical image segmentation algorithms and conduct research on medical image fusion algorithms based on a hybrid channel-space attention mechanism and medical image segmentation algorithms with a hybrid architecture of Convolutional Neural Networks (CNN) and Visual Transformer. To the problem that medical image segmentation algorithms are difficult to capture long-range feature dependencies, this paper proposes a medical image segmentation model SW-UNet based on a hybrid CNN and Vision Transformer (ViT) framework. Self-attention mechanism and sliding window design of Visual Transformer are used to capture global feature associations and break the perceptual field limitation of convolutional operations due to inductive bias. At the same time, a widened self-attentive vector is used to streamline the number of modules and compress the model size so as to fit the characteristics of a small amount of medical data, which makes the model easy to be overfitted. Experiments on the LUNA16 lung nodule image dataset validate the algorithm and show that the proposed network can achieve efficient medical image segmentation on a lightweight scale. In addition, to validate the migratability of the model, we performed additional validation on other tumor datasets with desirable results. Our research addresses the crucial need for improved medical image segmentation algorithms. By introducing the SW-UNet model, which combines CNN and ViT, we successfully capture long-range feature dependencies and break the perceptual field limitations of traditional convolutional operations. This approach not only enhances the efficiency of medical image segmentation but also maintains model scalability and adaptability to small medical datasets. The positive outcomes on various tumor datasets emphasize the potential migratability and broad applicability of our proposed model in the field of medical image analysis.
- Book Chapter
122
- 10.1007/978-3-030-32245-8_69
- Jan 1, 2019
Convolutional neural network (CNN), in particular the Unet, is a powerful method for medical image segmentation. To date Unet has demonstrated state-of-art performance in many complex medical image segmentation tasks, especially under the condition when the training and testing data share the same distribution (i.e. come from the same source domain). However, in clinical practice, medical images are acquired from different vendors and centers. The performance of a U-Net trained from a particular source domain, when transferred to a different target domain (e.g. different vendor, acquisition parameter), can drop unexpectedly. Collecting a large amount of annotation from each new domain to retrain the U-Net is expensive, tedious, and practically impossible. In this work, we proposed a generic framework to address this problem, consisting of (1) an unpaired generative adversarial network (GAN) for vendor-adaptation, and (2) a Unet for object segmentation. In the proposed Unet-GAN architecture, GAN learns from Unet at the feature level that is segmentation-specific. We used cardiac cine MRI as the example, with three major vendors (Philips, Siemens, and GE) as three domains, while the methodology can be extended to medical images segmentation in general. The proposed method showed significant improvement of the segmentation results across vendors. The proposed Unet-GAN provides an annotation-free solution to the cross-vendor medical image segmentation problem, potentially extending a trained deep learning model to multi-center and multi-vendor use in real clinical scenario.
- Research Article
1
- 10.1002/mp.17044
- Apr 3, 2024
- Medical Physics
With the continuous development of deep learning algorithms in the field of medical images, models for medical image processing based on convolutional neural networks have made great progress. Since medical images of rectal tumors are characterized by specific morphological features and complex edges that differ from natural images, achieving good segmentation results often requires a higher level of enrichment through the utilization of semantic features. The efficiency of feature extraction and utilization has been improved to some extent through enhanced hardware arithmetic and deeper networks in most models. However, problems still exist with detail loss and difficulty in feature extraction, arising from the extraction of high-level semantic features in deep networks. In this work, a novel medical image segmentation model has been proposed for Magnetic Resonance Imaging (MRI) image segmentation of rectal tumors. The model constructs a backbone architecture based on the idea of jump-connected feature fusion and solves the problems of detail feature loss and low segmentation accuracy using three novel modules: Multi-scale Feature Retention (MFR), Multi-branch Cross-channel Attention (MCA), and Coordinate Attention (CA). Compared with existing methods, our proposed model is able to segment the tumor region more effectively, achieving 97.4% and 94.9% in Dice and mIoU metrics, respectively, exhibiting excellent segmentation performance and computational speed. Our proposed model has improved the accuracy of both lesion region and tumor edge segmentation. In particular, the determination of the lesion region can help doctors identify the tumor location in clinical diagnosis, and the accurate segmentation of the tumor edge can assist doctors in judging the necessity and feasibility of surgery.
- Research Article
15
- 10.3390/app10196838
- Sep 29, 2020
- Applied Sciences
Malignant lesions are a huge threat to human health and have a high mortality rate. Locating the contour of organs is a preparation step, and it helps doctors diagnose correctly. Therefore, there is an urgent clinical need for a segmentation model specifically designed for medical imaging. However, most current medical image segmentation models directly migrate from natural image segmentation models, thus ignoring some characteristic features for medical images, such as false positive phenomena and the blurred boundary problem in 3D volume data. The research on organ segmentation models for medical images is still challenging and demanding. As a consequence, we redesign a 3D convolutional neural network (CNN) based on 3D U-Net and adopted the render method from computer graphics for 3D medical images segmentation, named Render 3D U-Net. This network adapts a subdivision-based point-sampling method to replace the original upsampling method for rendering high-quality boundaries. Besides, Render 3D U-Net integrates the point-sampling method into 3D ANU-Net architecture under deep supervision. Meanwhile, to reduce false positive phenomena in clinical diagnosis and to achieve more accurate segmentation, Render 3D U-Net specially designs a module for screening false positive. Finally, three public challenge datasets (MICCAI 2017 LiTS, MICCAI 2019 KiTS, and ISBI 2019 segTHOR) were selected as experiment datasets and to evaluate the performance on target organs. Compared with other models, Render 3D U-Net improved the performance on both overall organ and boundary in the CT image segmentation tasks, including in the liver, kidney, and heart.
- Conference Article
24
- 10.1109/iccrd.2010.155
- Jan 1, 2010
Image segmentation is an important process to extract information from complex medical images. Segmentation has wide application in medical field. The main objective of image segmentation is to partition an image into mutually exclusive and exhausted regions such that each region of interest is spatially contiguous and the pixels within the region are homogeneous with respect to a predefined criterion. Widely used homogeneity criteria include values of intensity, texture, color, range, surface normal and surface curvatures. During the past, many researchers in the field of medical imaging and soft computing have made significant survey in the field of image segmentation. This paper aims to develop an improved method of segmentation using Fuzzy- Neuro logic to detect various tissues like white matter, gray matter; cerebral spinal fluid and tumor for a given magnetic resonance image data set. Generally magnetic resonance images always contain a significant amount of noise caused by operator performance, equipment, and the environment, which can lead to serious inaccuracies. So segmentation of such medical images is a challenging problem in the field of image analysis. Several diagnostics are based on proper segmentation of the digitized image. Segmentation of medical images is needed for applications involving estimation of the boundary of an object, classification of tissue abnormalities, shape analysis, contour detection. In particular Fuzzy-Neuro logic segmentation algorithm is used to provide satisfactory results compared to K-means, Fuzzy C-Means, Neural Network and Fuzzy logic.
- Research Article
- 10.54254/2755-2721/2025.ld28949
- Nov 5, 2025
- Applied and Computational Engineering
Convolutional Neural Networks (CNNs) have demonstrated immense application potential in medical image processing due to their powerful feature learning capabilities, playing an increasingly vital role in disease diagnosis and treatment planning. However, when processing complex medical image data, CNNs still face challenges such as high computational resource consumption, limited model generalization ability, and high demand for labeled data. In recent years, to overcome these limitations, numerous researchers have focused on innovative structural improvements for CNNs, with enhancing model efficiency emerging as a central research priority. This review will focus on analyzing the contributions and significant effects of these improvement strategies in enhancing medical image classification and segmentation accuracy, accelerating model inference speed, and strengthening model robustness. Based on an in-depth synthesis of existing research, we draw the core conclusion: improved CNN architectures are an effective approach to enhancing the accuracy and efficiency of medical image segmentation, and specific improvement strategies demonstrate significant advantages across different tasks and various medical image modalities.
- Conference Article
- 10.1109/sceecs68810.2026.11430008
- Jan 31, 2026
Deep learning has dramatically advanced medical image segmentation, enabling precise delineation of anatomical structures and pathological regions that are critical for diagnosis and treatment planning [1]. Among the numerous architectures, convolution neural network (CNN) based models and hybrid CNN transformer networks have proven especially effective. In this review paper, we concentrate on five prominent segmentation models ranging form the original U-Net, its successor UNet++, the hybrid TransUNet and lastly the two backbone-based CNN models FCNResNet50 and DeepLabV3ResNet50. We first outline the evolution of CNN-based and transformer-based segmentation methods, highlighting the strengths of local feature extraction in CNNs versus global context modeling in transformers [2]. We then present detailed discussions of each model’s design, performance, and limitations. Comparative analysis considers segmentation is done by replicating the models and then running them in parallel to see how was there performance. The experimented is conducted on the HPC Cluster in parallel so that there is no chance of bias based optimization or stabilization of a process by the OS or the firmware on prolonged run of the same program. Clinical applications of these models are discussed, including tumor and organ segmentation, with emphasis on their practical benefits. Finally, we address current challenges, such as limited annotated data and variability of imaging modalities and suggest future research directions including transfer learning, zero-shot segmentation, and multi-modal integration [3]. This comprehensive review synthesizes insights from recent literature [4] [5] and clarifies how each model contributes to medical image analysis.
- Research Article
23
- 10.1155/2020/1645479
- May 15, 2020
- Complexity
Medical image segmentation is a key technology for image guidance. Therefore, the advantages and disadvantages of image segmentation play an important role in image-guided surgery. Traditional machine learning methods have achieved certain beneficial effects in medical image segmentation, but they have problems such as low classification accuracy and poor robustness. Deep learning theory has good generalizability and feature extraction ability, which provides a new idea for solving medical image segmentation problems. However, deep learning has problems in terms of its application to medical image segmentation: one is that the deep learning network structure cannot be constructed according to medical image characteristics; the other is that the generalizability y of the deep learning model is weak. To address these issues, this paper first adapts a neural network to medical image features by adding cross-layer connections to a traditional convolutional neural network. In addition, an optimized convolutional neural network model is established. The optimized convolutional neural network model can segment medical images using the features of two scales simultaneously. At the same time, to solve the generalizability problem of the deep learning model, an adaptive distribution function is designed according to the position of the hidden layer, and then the activation probability of each layer of neurons is set. This enhances the generalizability of the dropout model, and an adaptive dropout model is proposed. This model better addresses the problem of the weak generalizability of deep learning models. Based on the above ideas, this paper proposes a medical image segmentation algorithm based on an optimized convolutional neural network with adaptive dropout depth calculation. An ultrasonic tomographic image and lumbar CT medical image were separately segmented by the method of this paper. The experimental results show that not only are the segmentation effects of the proposed method improved compared with those of the traditional machine learning and other deep learning methods but also the method has a high adaptive segmentation ability for various medical images. The research work in this paper provides a new perspective for research on medical image segmentation.
- Conference Article
8
- 10.1117/12.2612288
- Apr 4, 2022
Despite the widespread use of Convolutional Neural Networks (CNNs) for segmentation in medical imaging, there is yet to be a validated method to determine what regions of input images inform the models’ decisions. To advance the field, we have 1) modified three general, prevalent methods of classification-attribution to be applicable for use with segmentation models, 2) developed a novel method of attribution explicitly for segmentation models, and 3) formulated validation metrics for these attributions so results can be quantitatively compared. To adapt existing methods of classification-attribution, we newly employed a weighted sum across attribution maps from each post-bottleneck layer. For our novel method of attribution (Kernel-Weighted Contribution), a weighted sum of activations from each kernel across all post-bottleneck layers was weighted by each kernel’s dependent and independent contributions to the segmentation. We used the generated attribution maps to mask the input images and generate new predicted segmentations. The methods were then scored based on their sensitivity to region importance (Prediction Preserved) and ability to only attribute relevant regions (Image Preserved). All three adapted classification-attribution methods showed a significant increase in both Prediction Preserved and Image Preserved scores. Kernel-Weighted Contribution showed a median decreased Image Preserved score of 2-12% and increased Prediction Preserved score of 12-21% compared with modified classification attribution methods. These new methods provide insight into how segmentation models use regions of input images. Clinically relevant features can consequently be extracted from both foreground and relevant background regions. Additionally, the metrics of validation facilitate a quantitative and objective comparison of segmentation-attribution methods.
- Research Article
36
- 10.1109/access.2020.2982197
- Jan 1, 2020
- IEEE Access
Image segmentation is an important technique for segmenting images without overlapping each other and having their own features. It has been rapidly developed in the field of medical imaging, but there is currently a difference between classification accuracy and segmentation accuracy for medical image segmentation. In this paper, the deep convolutional neural network is combined with the cascading structure, and a uniform learning framework is established with the use conditional random field. This paper first adds a cascading structure under the deep convolutional neural networks (DCNN) framework to more effectively simulate the direct dependencies between spatial closure tags. Secondly, the conditional random field (CRF) is used for post-segmentation processing, which effectively solves the contradiction between the segmentation accuracy and the network depth and the number of pooling times in the traditional convolutional network. Secondly, the CRF is used for post-segmentation processing, which effectively solves the contradiction between the segmentation accuracy and the network depth and the number of pooling times in the traditional convolutional network.