Active Boundary Loss for Semantic Segmentation
This paper proposes a novel active boundary loss for semantic segmentation. It can progressively encourage the alignment between predicted boundaries and ground-truth boundaries during end-to-end training, which is not explicitly enforced in commonly used cross-entropy loss. Based on the predicted boundaries detected from the segmentation results using current network parameters, we formulate the boundary alignment problem as a differentiable direction vector prediction problem to guide the movement of predicted boundaries in each iteration. Our loss is model-agnostic and can be plugged in to the training of segmentation networks to improve the boundary details. Experimental results show that training with the active boundary loss can effectively improve the boundary F-score and mean Intersection-over-Union on challenging image and video object segmentation datasets.
- Research Article
- 10.1186/s12880-026-02245-y
- Feb 25, 2026
- BMC medical imaging
A novel method for tackling the problem of imbalanced data in medical image segmentation is proposed in this work. In Balanced Cross Entropy (BCE) loss, which is a type of weighted CE loss, the weight assigned to each class is the inverse of the class frequency. These balancing weights are expected to equalize the effect of each class on the overall loss and prevent the model from being biased towards the majority class. But, as it has been shown in previous studies, this method degrades the performance by a large margin. Therefore, Balanced CE is not a popular loss in medical segmentation tasks, and usually a region-based loss, like the Dice loss, is used to address the class imbalance problem. Existing weighting CE loss approaches, like Balanced CE, are usually adapted from classification tasks. In this study we propose a weighting method which is designed for segmentation and takes advantage of spatial and geometrical properties of the labels. In the proposed method, the weighting of cross entropy loss for each class is based on a dilated area of each class mask, and balancing weights are assigned to each class together with its surrounding pixels. The goal of this study is to show that the performance of Balanced CE loss can be greatly improved my modifying its weighting strategy. Experiments on different datasets show that the proposed Dilated Balanced CE (DBCE) loss outperforms the Balanced CE loss by a large margin and produces superior results compared to CE loss, and its performance is similar or better compared to the performance of the combination of Dice and CE loss. Also, the performance of different combination losses can be improved by replacing the CE loss with the proposed DBCE loss. This study shows that a weighted cross entropy loss with the right weighing strategy can be as effective as a region-based loss in handling the class imbalance problem.
- Conference Article
- 10.1109/icip.2005.1529672
- Jan 1, 2005
Automatic segmentation of objects in images is an ongoing research problem with applications in many fields. If a scene is imaged serially over time, an advantage can be gained by using segmentation results from previous and subsequent images when segmenting the current image. This paper discusses a probabilistic framework for making use of temporal information in the segmentation process. A subset of dynamic Bayesian networks, the hidden Markov model is described as a means to improve segmentation over statistical classification techniques that use static pixel intensity information alone. An application of this technique to the segmentation of tumors in magnetic resonance images (MRIs) is described. The segmentation accuracy was increased compared to a popular 3D spatial only segmentation method.
- Research Article
13
- 10.1007/s11263-019-01184-2
- May 27, 2019
- International Journal of Computer Vision
We present a novel form of interactive object segmentation called Click Carving which enables accurate segmentation of objects in images and videos with only a few point clicks. Whereas conventional interactive pipelines take the user’s initialization as a starting point, we show the value in the system taking lead even in initialization. In particular, for a given image or a video frame, the system precomputes a ranked list of thousands of possible segmentation hypotheses (also referred to as object region proposals) using appearance and motion cues. Then, the user looks at the top ranked proposals, and clicks on the object boundary to carve away erroneous ones. This process iterates (typically 2–3 times), and each time the system revises the top ranked proposal set, until the user is satisfied with a resulting segmentation mask. In the case of images, this mask is considered as the final object segmentation. However in the case of videos, the object region proposals rely on motion as well, and the resulting segmentation mask in the first frame is further propagated across the video to obtain a complete spatio-temporal object tube. On six challenging image and video datasets, we provide extensive comparisons with both existing work and simpler alternative methods. In all, the proposed Click Carving approach strikes an excellent of accuracy and human effort. It outperforms all similarly fast methods, and is competitive or better than those requiring 2–12 times the effort.
- Research Article
1
- 10.31676/0235-2591-2024-2-53-62
- May 7, 2024
- Horticulture and viticulture
This article reports the results of research studies conducted in 2023–2024 on transfer learning of Segmentation Convolutional Neural Networks (Seg-CNN) models for classification, recognition, and segmentation of branches with apple fruits and stems in images. State-of-the-art convolutional neural network architectures, i.e., YOLOv8(n,s,m,l,x)-seg, were used for a detailed segmentation of biological objects in images of varying complexity and scale at the pixel level. An image dataset collected in the field using a GoPro HERO 11 camera was marked up for transfer model training. Data augmentation was performed, producing a total of 2500 images. Image markup was performed using the polygon annotation tool. As a result, polygonal contours around objects were created, outlines of branches, apple tree fruits, and stems were outlined, and segments of objects in the images were indicated. The objects were assigned the following classes: Apple branch, Apple fruit, and Apple stem. Binary classification metrics, such as Precision and Recall, as well as Mean Average Precision (mAP), were used to evaluate the performance of the trained models in recognizing branches with apple fruits and stems in images. The YOLOv8x-seg (mAP50 0.758) and YOLOv8l-seg (mAP50 0.74) models showed high performance in terms of all metrics in recognizing branches, apple fruit, and fruit stems in images, outperforming the YOLOv8n-seg (mAP50 0.7) model due to their more complex architecture. The YOLOv8n-seg model has a faster frame processing speed (11.39 frames/s), rendering it a preferred choice for computing systems with limited resources. The results obtained confirm the prospects of using machine learning algorithms and convolutional neural networks for segmentation and pixel-by-pixel classification of branches with apple fruits and stems on RGB images for monitoring the condition of plants and determining their geometric characteristics.
- Conference Article
5
- 10.1109/spices52834.2022.9774193
- Mar 10, 2022
Semantic segmentation using deep learning techniques is now state-of-the-art in medical image segmentation, especially for brain Magnetic Resonant images (MRI). SegNet, a fully convolutional neural network architecture, is widely used for image segmentation. However, it is less frequently used than the other popular competing approach of U-Net, for brain MRI segmentation. A few researchers have proposed a fusion of SegNet and U-Net architectures to combine their desirable properties for performance gains. In this paper, a different direction of research is undertaken. A simpler yet more accurate shallow SegNet architecture is proposed to yield promising segmentation performance on brain MRI. The proposed architecture uses a bilinear interpolation upsampling mechanism instead of max unpooling. Further, a modified cross-entropy loss is employed that is weighted differently for different classes. The class imbalance problem is effectively overcome using this weighted cross-entropy loss. Performance comparison of the proposed architecture with existing works indicates that the average dice factor is enhanced to 0.83 with an improvement of 0.11 over the baseline SegNet. It is demonstrated that the proposed shallow SegNet is a simpler yet more accurate model compared to both the existing SegNet and U-Net and could serve as a baseline for fine-grained image segmentation tasks.
- Conference Article
36
- 10.1109/icra46639.2022.9811702
- May 23, 2022
Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation.Specifically, uneven class distributions in a training dataset often result in unsatisfactory performance on under-represented classes.Many works have proposed to weight the standard cross entropy loss function with pre-computed weights based on class statistics, such as the number of samples and class margins.There are two major drawbacks to these methods: 1) constantly up-weighting minority classes can introduce excessive false positives in semantic segmentation; 2) a minority class is not necessarily a hard class.The consequence is low precision due to excessive false positives.In this regard, we propose a hardclass mining loss by reshaping the vanilla cross entropy loss such that it weights the loss for each class dynamically based on instantaneous recall performance.We show that the novel recall loss changes gradually between the standard cross entropy loss and the inverse frequency weighted loss.Recall loss also leads to improved mean accuracy while offering competitive mean Intersection over Union (IoU) performance.On Synthia dataset 1 , recall loss achieves 9% relative improvement on mean accuracy with competitive mean IoU using DeepLab-ResNet18 compared to the cross entropy loss.
- Research Article
15
- 10.1109/tcsvt.2021.3109084
- Jan 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
Deep learning methods have achieved significant progress in the presence of correctly annotated datasets in instance segmentation. However, object classes in large-scale datasets are sometimes ambiguous, which easily causes confusion. Besides, limited experience and knowledge of annotators can lead to mislabeled object semantic classes. To solve this issue, a novel method is proposed in this paper, which considers different roles of noisy class labels in different sub-tasks. Our method is based on two basic observations: firstly, the foreground-background annotation of a sample is correct even though its class label is noisy. Secondly, symmetric loss benefits the model robustness to noisy labels but harms the learning of hard samples, while cross entropy loss is the opposite. Based on the two basic observations, in the foreground-background sub-task, cross entropy loss is used to fully exploit correct gradient guidance. In the foreground-instance sub-task, symmetric loss is used to prevent incorrect gradient guidance provided by noisy class labels. Furthermore, we apply contrastive self-supervised loss to update features of all foreground, to compensate for insufficient guidance provided by partially correct labels especially in the highly noisy setting. Extensive experiments conducted with three popular datasets (i.e., Pascal VOC, Cityscapes and COCO) have demonstrated the effectiveness of our method in a wide range of noisy class label scenarios.
- Book Chapter
1
- 10.1007/978-3-031-15934-3_26
- Jan 1, 2022
Most supervised semantic segmentation methods to date choose cross-entropy loss (CE) as the default choice. Standard CE treats all pixels in the image indiscriminately, which lacks consideration of context differences between pixels, leading to the model being overwhelmed by numerous homogeneous pixels in large-scale objects. It ignores an essential spatial prior that can be deduced from Ground Truth-the segmentation edges, which can be practical to distinguish the excessive homogeneous pixels. Therefore, we propose a novel loss function termed Spatial-enhanced Loss (SL), in which the image is spatially separated into the edge region and the body region with the assistance of the edge derived from Ground Truth. Experiments evidence that SL has impressive superiority over Focal Loss, standard cross-entropy loss, class-balanced cross-entropy loss and Dice Loss. We achieve substantial improvements on multiple models without using any tricks, up to 1.60% mIoU.KeywordsSpatial-enhanced lossEdge-body separationWeighted cross-entropySemantic segmentation
- Conference Article
10
- 10.1109/ccece.1999.808047
- May 9, 1999
Many objects in images of natural scenes are so complex that describing them by traditional techniques is inadequate. This paper presents a family of techniques suitable for texture analysis and segmentation of objects in aerial images. Texture has been one of the most important but difficult properties for image coding and compression. It is important because it describes the entire area of a region and provides the essential structure information in regions of an image. Our goal here is to decompose an image to texturally homogenous regions. An efficient technique for computing the fractal dimension of images is used. Three different techniques; the Hurst transform, the Sobel operator and the variance are applied to two images and the results are compared. It is shown that variance dimension converts the original image to one whose texture information permits simple thresholding for texture analysis and segmentation.
- Research Article
277
- 10.1016/j.media.2018.12.003
- Dec 15, 2018
- Medical Image Analysis
Micro-Net: A unified model for segmentation of various objects in microscopy images.
- Book Chapter
92
- 10.1007/978-3-030-32239-7_42
- Jan 1, 2019
Image segmentation plays an important role in pathology image analysis as the accurate separation of nuclei or glands is crucial for cancer diagnosis and other clinical analyses. The networks and cross entropy loss in current deep learning-based segmentation methods originate from image classification tasks and have drawbacks for segmentation. In this paper, we propose a full resolution convolutional neural network (FullNet) that maintains full resolution feature maps to improve the localization accuracy. We also propose a variance constrained cross entropy (varCE) loss that encourages the network to learn the spatial relationship between pixels in the same instance. Experiments on a nuclei segmentation dataset and the 2015 MICCAI Gland Segmentation Challenge dataset show that the proposed FullNet with the varCE loss achieves state-of-the-art performance. The code is publicly available (https://github.com/huiqu18/FullNet-varCE).
- Research Article
51
- 10.1016/j.neucom.2018.10.103
- Apr 26, 2019
- Neurocomputing
Bin loss for hard exudates segmentation in fundus images
- Research Article
- 10.18287/2412-6179-co-1656
- Dec 1, 2025
- Computer Optics
The paper considers the problem of instance segmentation of objects in images using modern deep learning models and synthetic data. The main attention is paid to the study of the effectiveness of synthetic data created on the basis of 3D models for pre-training segmentation models. Such architectures as U-Net, DeepLabV3+, Mask R-CNN and YOLOv8 are considered. To improve the quality of synthetic data, various parameters of automatic data generation were used, including random positioning of objects, adding backgrounds, changing lighting, changing object texture, adding blur and adding obstacles. The experiments showed that each of these steps significantly contributes to the accuracy of the models, and their combination provides the best results (mAP 92.1%). The results confirm that the combined use of synthetic and real data allows bridging the gap between the synthetic and real environment. The best performance was achieved by the YOLOv8 model, which demonstrated high accuracy and processing speed. The obtained findings highlight the importance of carefully tuning the parameters of synthetic data generation to improve segmentation in real-world applications.
- Research Article
3
- 10.1016/j.acra.2025.05.036
- Jun 1, 2025
- Academic radiology
Breast arterial calcification (BAC) is increasingly recognized as a significant indicator of cardiovascular risk, necessitating improvements in detection and quantification methods through mammographic screening. To develop and validate a deep-learning model capable of detecting, segmenting, and quantifying BAC in mammograms, improving mammographic screening for cardiovascular risk assessment. We conducted a retrospective study using mammograms from 369 patients. The study utilized a modified U-Net architecture that incorporates Hausdorff loss, Dice loss, and Binary Cross-Entropy (BCE) loss for segmentation and subsequent quantification. The model's performance was assessed using the Dice score, BCE loss for segmentation accuracy, linear fit, and Bland-Altman analysis for quantification accuracy. Our model achieved high segmentation accuracy with Dice scores of 0.90 for the training set and 0.89 for the validation set. Quantification reliability was validated through Bland-Altman analysis, showing a mean difference of -0.98 mg of calcium in the training set. The model also demonstrated high classification accuracy with F1 scores of 0.97 and 0.93 for validation and training sets, respectively, in BAC detection. The deep-learning framework substantially improves BAC detection, segmentation, and quantification in mammograms, advancing the accuracy and efficiency of cardiovascular risk screening. This study supports the potential for integrated dual-purpose screening in women's healthcare.
- Research Article
6
- 10.1038/s41598-023-35276-4
- May 19, 2023
- Scientific Reports
To increase the accuracy of medical image analysis using supervised learning-based AI technology, a large amount of accurately labeled training data is required. However, the supervised learning approach may not be applicable to real-world medical imaging due to the lack of labeled data, the privacy of patients, and the cost of specialized knowledge. To handle these issues, we utilized Kronecker-factored decomposition, which enhances both computational efficiency and stability of the learning process. We combined this approach with a model-agnostic meta-learning framework for the parameter optimization. Based on this method, we present a bidirectional meta-Kronecker factored optimizer (BM-KFO) framework to quickly optimize semantic segmentation tasks using just a few magnetic resonance imaging (MRI) images as input. This model-agnostic approach can be implemented without altering network components and is capable of learning the learning process and meta-initial points while training on previously unseen data. We also incorporated a combination of average Hausdorff distance loss (AHD-loss) and cross-entropy loss into our objective function to specifically target the morphology of organs or lesions in medical images. Through evaluation of the proposed method on the abdominal MRI dataset, we obtained an average performance of 78.07% in setting 1 and 79.85% in setting 2. Our experiments demonstrate that BM-KFO with AHD-loss is suitable for general medical image segmentation applications and achieves superior performance compared to the baseline method in few-shot learning tasks. In order to replicate the proposed method, we have shared our code on GitHub. The corresponding URL can be found: https://github.com/YeongjoonKim/BMKFO.git.