Evaluating the Impact of Annotation Expertise on AI-Based Ultrasound Segmentation: A Case Study on Left Atrial Appendage.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Medical image segmentation using artificial intelligence (AI) is a prominent area of research with diverse applications across various fields. During the last years, a multitude of datasets representing different body structures have been developed and made publicly available. However, the volume of data-particularly the ground truth data, which often relies on manual annotation-remains limited. Supervised learning remains the state-of-the-art approach for deep learning methods; however, its performance is often reported as dependent on the expertise of the operator for the ground truth generation. This dependency becomes more critical when dealing with challenging medical imaging modalities, such as ultrasound, often characterized by low image quality and various artifacts. This study aims to investigate the influence of user expertise on the accuracy of ground truth annotations and their impact on the final performance of the segmentation method. Specifically, we focus on the task of segmenting the left atrial appendage (LAA) in ultrasound images. Two datasets were initially created: one annotated by an Expert and the other by a novice observer. Additionally, synthetic variations of these manually annotated datasets were generated by introducing both systematic and non-systematic errors to examine their effects on segmentation outcomes. Using the nnU-Net framework as the computational basis, the network was trained on each dataset, and the results were evaluated against the Expert's test labels. Training with Expert and Naive contours achieved Dice values in the test set of 0.81 ± 0.09 and 0.77 ± 0.12, respectively, with no statistically significant differences between them. Similarly, training with synthetic variations obtained showed no statistically significant differences for non-systematic errors, whereas systematic errors result in statistically significant differences against manual contours. These findings demonstrate that the AI network remains highly effective across most tested scenarios, even when synthetic errors are introduced, showcasing its ability to handle non-systematic errors efficiently, which synthetically mimic the variability between observers. However, the network encounters greater challenges with systematic errors, failing to accurately delineate the LAA boundaries.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.1155/2019/2856289
Numerical Studies on Forecast Error Correction of GRAPES Model with Variational Approach
  • May 7, 2019
  • Advances in Meteorology
  • Dengxin He + 3 more

To implement deterministic short-range numerical weather forecast error correction, this study develops a novel approach using the variational method and historical data. Based on time-dependency characteristic of nonsystematic forecast error, variational approach is adopted to establish the mapping relation between nonsystematic error series and the prior period nonsystematic error series, so as to estimate nonsystematic error in the future and revise the forecast under the premise of the revision for forecast systematic forecast error. According to the hindcast daily data of geopotential height on 500 hPa generated by GRAPES model on January and July from 2002 to 2010, preliminary analysis is carried out on characteristics of forecast error in East Asia. Further estimation and forecast correction test are conducted for nonsystematic error. The result shows that the nonsystematic forecast error in the GRAPES model has obvious characteristic of state dependency. Nonsystematic forecast error changes along season and the state of weather and accounts for great proportion in total forecast error. Nonsystematic forecast error estimated by variational approach is relatively close to the real forecast error. After nonsystematic correction, the corrected 24 h and 48 h forecast of majority samples has a smaller RMSE. Further study on temperature shows a similar result, even comparing to the observational upper air MICAPS data.

  • Dissertation
  • 10.17077/etd.005508
Globally optimal surface segmentation for medical images using deep learning
  • Oct 8, 2020
  • Leixin Zhou + 5 more

Automated image segmentation plays a very import role in quantitative image analysis. In recent several years, deep learning (DL) or CNN based method for semantic segmentation has become very popular in computer vision and then in medical image research communities. Most deep learning based semantic segmentation methods are region-based, i.e. each pixel is labeled as target object or background. As there is no explicit constraint among the labeling of different pixels, usually post processing, such as morphological operation, is required to get reasonable prediction. Actually, one can also model the segmentation problem as surface based: each pixel is labeled as surface or non-surface and then target object can be defined based on prior knowledge, i.e. one has to know region on which side of surface corresponds to target object. Apparently these two views are equivalent but modeled in different way. As one prominent surface-based method, Graph-Search (GS) has achieved great success, especially in medical field. This method is capable of simultaneously detecting multiple interacting surfaces, in which the optimality is controlled by the cost functions designed for individual surfaces and by several geometric constraints defining the surface smoothness and interrelations. However, the cost function design is usually non-trivial. Inspired by the GS, one research focus is to solve the surface segmentation problem with global optimality guarantee for both 3-D and 2-D images using deep learning, so that the cost functions are learned by deep networks utilizing training dataset. In the era of big data, automated deep learning based image processing enables to process a large amount of medical image data efficiently. This is especially helpful for semantic segmentation, as manual contouring is tedious and time consuming. In many applications, the deep learning based segmentation methods can even achieve expert-level accuracy. However, in practice, deep learning methods may fail due to many factors: such as domain shift, adversarial noise, and low image quality. Therefore, another research focus is testing the robustness of deep learning based segmentation methods and even predicting segmentation quality without ground truth. To accomplish the first objective of this thesis work, novel graph based frameworks are proposed to seamlessly integrated with deep neural networks. First, a graph model based on convex quadratic programming is developed, the solving of which is guaranteed globally optimal. The developed model is applied and validated on the intra-retinal layer segmentation of Spectral Domain Optical Coherence Tomography (SD-OCT) images of eye and vessel walls in Intravascular Ultrasound (IVUS) images. Second, to extend this model based deep learning surface segmentation method to 3-D, one shape-aware patch generation approach for Genus-0 surface is proposed to enable normal 3-D CNN to properly work on. The developed method is applied to prostate segmentation of MR images and spleen segmentation of CT images in 3-D. Third, the robustness of the proposed model based surface segmentation method is tested under adversarial attacks and one robust segmentation quality assessment approach based on conditional reconstruction networks is also developed and verified on cardiac left-ventricular myocardium (LVM) segmentation.

  • Research Article
  • Cite Count Icon 3
  • 10.1175/1520-0493(1983)111<1219:ioseia>2.0.co;2
Identification of Systematic Errors in a Numerical Weather Forecast
  • Jun 1, 1983
  • Monthly Weather Review
  • Patrick A Harr + 2 more

Many numerical model verification schemes are handicapped by their inability to separate non-systematic errors and systematic errors. In this study, for a specific synoptic event, a statistical method is described to determine a minimum number of cases which can be averaged to represent numerical forecast errors which are truly systematic and not smoothed fields of rapidly varying non-systematic errors. Error patterns derived from forecasts and observations stored at Fleet Numerical Oceanography Center are used to compare a systematic error pattern, defined by the total number of available cases with subset error patterns to determine the minimum number of cases needed to filter out the unwanted non-systematic error components. The analysis indicates that a minimum of 8 cases must be averaged to adequately identify systematic errors in a 24 h forecast of a Shanghai Low. A minimum of 5 cases are needed for a 72 h forecast of the same event. Error patterns are identified by contours of the Student's t statistic calculated at each grid point. This contour pattern objectively determines the significance of the forecast errors and is shown to be a very useful method of portraying, systematic forecast errors.

  • Research Article
  • Cite Count Icon 79
  • 10.1016/s0094-730x(99)00028-5
Relationship of length and grammatical complexity to the systematic and nonsystematic speech errors and stuttering of children who stutter
  • Feb 25, 2000
  • Journal of Fluency Disorders
  • Kenneth S Melnick + 1 more

Relationship of length and grammatical complexity to the systematic and nonsystematic speech errors and stuttering of children who stutter

  • Research Article
  • Cite Count Icon 18
  • 10.1109/tnnls.2023.3238381
GREnet: Gradually REcurrent Network With Curriculum Learning for 2-D Medical Image Segmentation.
  • Jul 1, 2024
  • IEEE transactions on neural networks and learning systems
  • Jinting Wang + 5 more

Medical image segmentation is a vital stage in medical image analysis. Numerous deep-learning methods are booming to improve the performance of 2-D medical image segmentation, owing to the fast growth of the convolutional neural network. Generally, the manually defined ground truth is utilized directly to supervise models in the training phase. However, direct supervision of the ground truth often results in ambiguity and distractors as complex challenges appear simultaneously. To alleviate this issue, we propose a gradually recurrent network with curriculum learning, which is supervised by gradual information of the ground truth. The whole model is composed of two independent networks. One is the segmentation network denoted as GREnet, which formulates 2-D medical image segmentation as a temporal task supervised by pixel-level gradual curricula in the training phase. The other is a curriculum-mining network. To a certain degree, the curriculum-mining network provides curricula with an increasing difficulty in the ground truth of the training set by progressively uncovering hard-to-segmentation pixels via a data-driven manner. Given that segmentation is a pixel-level dense-prediction challenge, to the best of our knowledge, this is the first work to function 2-D medical image segmentation as a temporal task with pixel-level curriculum learning. In GREnet, the naive UNet is adopted as the backbone, while ConvLSTM is used to establish the temporal link between gradual curricula. In the curriculum-mining network, UNet++ supplemented by transformer is designed to deliver curricula through the outputs of the modified UNet++ at different layers. Experimental results have demonstrated the effectiveness of GREnet on seven datasets, i.e., three lesion segmentation datasets in dermoscopic images, an optic disc and cup segmentation dataset and a blood vessel segmentation dataset in retinal images, a breast lesion segmentation dataset in ultrasound images, and a lung segmentation dataset in computed tomography (CT).

  • Conference Article
  • Cite Count Icon 67
  • 10.1117/12.228968
&lt;title&gt;UMBmark: a benchmark test for measuring odometry errors in mobile robots&lt;/title&gt;
  • Dec 27, 1995
  • Johann Borenstein + 1 more

This paper introduces a method for measuring odometry errors in mobile robots and for expressing these errors quantitatively. When measuring odometry errors, one must distinguish between (1) systematic errors, which are caused by kinematic imperfections of the mobile robot (for example, unequal wheel-diameters), and (2) non-systematic errors, which may be caused by wheel slippage or irregularities of the floor. Systematic errors are a property of the robot itself, and they stay almost constant over prolonged periods of time, while non- systematic errors are a function of the properties of the floor. Our method, called the University of Michigan benchmark test (UMBmark), is especially designed to uncover certain systematic errors that are likely to compensate for each other (and thus, remain undetected) in less rigorous tests. This paper explains the rationale for the UMBmark procedure and explains the procedure in detail. Experimental results from different mobile robots are also presented and discussed. Furthermore, the paper proposes a method for measuring non-systematic errors, called extended UMBmark. Although the measurement of non-systematic errors is less useful because it depends strongly on the floor characteristics, one can use the extended UMBmark test for comparison of different robots under similar conditions.

  • Research Article
  • Cite Count Icon 31
  • 10.1016/j.eswa.2023.119939
DSEU-net: A novel deep supervision SEU-net for medical ultrasound image segmentation
  • Mar 22, 2023
  • Expert Systems with Applications
  • Gongping Chen + 6 more

DSEU-net: A novel deep supervision SEU-net for medical ultrasound image segmentation

  • Research Article
  • Cite Count Icon 20
  • 10.1007/s00138-010-0261-4
Contour segmentation in 2D ultrasound medical images with particle filtering
  • Apr 15, 2010
  • Machine Vision and Applications
  • Donka Angelova + 1 more

Object segmentation in medical images is an actively investigated research area. Segmentation techniques are a valuable tool in medical diagnostics for cancer tumours and cysts, for planning surgery operations and other medical treatment. In this paper, a Monte Carlo algorithm for extracting lesion contours in ultrasound medical images is proposed. An efficient multiple model particle filter for progressive contour growing (tracking) from a starting point is developed, accounting for convex, non-circular forms of delineated contour areas. The driving idea of the proposed particle filter consists in the incorporation of different image intensity inside and outside the contour into the filter likelihood function. The filter employs image intensity gradients as measurements and requires information about four manually selected points: a seed point, a starting point, arbitrarily selected on the contour, and two additional points, bounding the measurement formation area around the contour. The filter performance is studied by segmenting contours from a number of real and simulated ultrasound medical images. Accurate contour segmentation is achieved with the proposed approach in ultrasound images with a high level of speckle noise.

  • Conference Article
  • Cite Count Icon 1
  • 10.1117/12.457442
&lt;title&gt;Optimized parametric calibration of autonomous vehicles&lt;/title&gt;
  • Feb 20, 2002
  • Philip R Kedrowski + 2 more

Odometry, also referred to as dead reckoning, is one of the least expensive and most widely used methods for mobile robot localization. However, mobile robots implementing dead reckoning are plagued with inaccuracy caused by systematic and non-systematic errors. In many cases, the most dominant source of inaccuracy is systematic errors. Systematic errors are caused by differences between the nominal and the actual dimensions of vehicle parameters (such as wheel radius and wheelbase measurements). Because systematic errors are inherent to the vehicle, the dead reckoning inaccuracy grows unbounded. Fortunately, it is possible to largely eliminate systematic errors by calibrating the parameters such that the differences between the nominal dimensions and the actual dimensions are minimized. This work presents a method for calibration of mobile robot parameters using an optimization engine. A cost function is developed based on the UMBmark (University of Michigan Benchmark) test pattern. This method is presented as a simple time efficient calibration tool for use during startup procedures of a differentially driven mobile robot. Comparisons are made between this method and an analytical calibration method developed at the University of Michigan. Results show that this tool consistently gives greater than 50% improvement in overall dead reckoning accuracy on an outdoor mobile robot, with respect to itself prior to calibration.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/ecmr.2015.7324208
Line-of-sight-based ToF camera's range image filtering for precise 3D scene reconstruction
  • Sep 1, 2015
  • Vijaya K Ghorpade + 2 more

A new method to filter jump edges in the range images produced from Time-of-Flight (ToF) camera is described, implemented and evaluated. SwissRanger camera like any other ToF camera have systematic and non-systematic errors. Systematic errors are predictable and can often be removed by calibration, whereas non-systematic are unpredictable and removed by applying filters. Jump edges are non-systematic errors which are seen as smooth often irregular sigmoid shape or curved transition between two surfaces. A new method which removes the jump edges has been proposed and compared with another method which is most cited and used. The comparison is based on the quality of filtered image, computation time for filtering and also its impact on registration of successive scans and reconstruction of the whole scene.

  • Abstract
  • 10.1016/j.ijrobp.2023.06.2173
Clinical Acceptability of Automatically Generated Elective Lymph Node Volumes for Head and Neck Cancer Patients
  • Sep 29, 2023
  • International Journal of Radiation Oncology*Biology*Physics
  • S Maroongroge + 13 more

Clinical Acceptability of Automatically Generated Elective Lymph Node Volumes for Head and Neck Cancer Patients

  • Research Article
  • 10.1038/s41598-025-04086-1
Attention residual network for medical ultrasound image segmentation
  • Jul 1, 2025
  • Scientific Reports
  • Honghua Liu + 7 more

Ultrasound imaging can distinctly display the morphology and structure of internal organs within the human body, enabling the examination of organs like the breast, liver, and thyroid. It can identify the locations of tumors, nodules, and other lesions, thereby serving as an efficacious tool for treatment detection and rehabilitation evaluation. Typically, the attending physician is required to manually demarcate the boundaries of lesion locations, such as tumors, in ultrasound images. Nevertheless, several issues exist. The high noise level in ultrasound images, the degradation of image quality due to the impact of surrounding tissues, and the influence of the operator’s experience and proficiency on the determination of lesion locations can all contribute to a reduction in the accuracy of delineating the boundaries of lesion sites. In the wake of the advancement of deep learning, its application in medical image segmentation is becoming increasingly prevalent. For instance, while the U-Net model has demonstrated a favorable performance in medical image segmentation, the convolution layers of the traditional U-Net model are relatively simplistic, leading to suboptimal extraction of global information. Moreover, due to the significant noise present in ultrasound images, the model is prone to interference. In this research, we propose an Attention Residual Network model (ARU-Net). By incorporating residual connections within the encoder section, the learning capacity of the model is enhanced. Additionally, a spatial hybrid convolution module is integrated to augment the model’s ability to extract global information and deepen the vertical architecture of the network. During the feature fusion stage of the skip connections, a channel attention mechanism and a multi-convolutional self-attention mechanism are respectively introduced to suppress noisy points within the fused feature maps, enabling the model to acquire more information regarding the target region. Finally, the predictive efficacy of the model was evaluated using publicly accessible breast ultrasound and thyroid ultrasound data. The ARU-Net achieved mean Intersection over Union (mIoU) values of 82.59% and 84.88%, accuracy values of 97.53% and 96.09%, and F1-score values of 90.06% and 89.7% for breast and thyroid ultrasound, respectively.

  • Research Article
  • Cite Count Icon 10
  • 10.1007/s12541-013-0305-6
Design of test track for accurate calibration of two wheel differential mobile robots
  • Jan 1, 2014
  • International Journal of Precision Engineering and Manufacturing
  • Changbae Jung + 4 more

Odometry using incremental wheel encoder sensors provides the relative position of a mobile robot. The major drawback of odometry is the accumulation of kinematic modeling errors when travel distance increases. The major systematic error sources are unequal wheel diameters and erroneous wheelbase. The UMBmark test is a practical and useful calibration scheme for systematic odometry errors of two wheel differential mobile robots. We previously proposed an accurate calibration scheme that extends the conventional UMBmark. A calibration experiment was carried out using the robot’s heading errors, and kinematic parameters were derived by considering the coupled effect of the systematic errors on a test track. In this paper, we propose design guidelines of test tracks for odometry calibration. As non-systematic errors constitute a grave problem in practical applications, the test track shape and size should be determined by considering the distributions of systematic and non-systematic errors. Numerical simulations and experiments clearly demonstrate that the proposed scheme results in more accurate calibration results.

  • Research Article
  • 10.1024/2673-8627/a000069
The Bayesian One-Sample t-Test Supersedes Correlation Analysis as a Test of Validity
  • Dec 17, 2024
  • European Journal of Psychology Open
  • Phivos Phylactou + 2 more

Abstract: Introduction: The validity of measurement, which refers to how accurately tools measure what they are intended to measure, is essential in science. Researchers rely on statistical approaches to test the validity of their measures. One such approach is correlation analysis. Even though correlation analysis can capture high nonsystematic errors between measures, it can often lead to misleading conclusions when observations are measured with systematic errors. Methods: We used Monte Carlo simulations with 10,000 iterations to generate the data in each simulation. Results: We demonstrate how correlation analysis is commonly used to test for validity and how this method can fail with systematic error. We further propose an alternative to correlation analysis – the Bayesian one-sample t-test – for cases where using a simple statistical test can be justified. We provide additional simulations as well as an application to real data, showcasing the implementation of the Bayesian one-sample t-test and how to use it to address the limitations of correlation analysis. Discussion: We suggest using the Bayesian one-sample t-test to identify both systematic and nonsystematic error and moreover to provide evidence for the null hypothesis of no differences between two measures. Conclusion: As a test of validity, the Bayesian one-sample t-test supersedes correlation analysis.

  • Research Article
  • Cite Count Icon 9
  • 10.1113/jp287011
Multiphysics simulations reveal haemodynamic impacts of patient-derived fibrosis-related changes in left atrial tissue mechanics.
  • Nov 8, 2024
  • The Journal of physiology
  • Alejandro Gonzalo + 14 more

Stroke is a leading cause of death and disability worldwide. Atrial myopathy, including fibrosis, is associated with an increased risk of ischaemic stroke, but the mechanisms underlying this association are poorly understood. Fibrosis modifies myocardial structure, impairing electrical propagation and tissue biomechanics, and creating stagnant flow regions where clots could form. Fibrosis can be mapped non-invasively using late gadolinium enhancement magnetic resonance imaging (LGE-MRI). However, fibrosis maps are not currently incorporated into stroke risk calculations or computational electro-mechano-fluidic models. We present multiphysics simulations of left atrial (LA) myocardial motion and haemodynamics using patient-specific anatomies and fibrotic maps from LGE-MRI. We modify tissue stiffness and active tension generation in fibrotic regions and investigate how these changes affect LA flow for different fibrotic burdens. We find that fibrotic regions and, to a lesser extent, non-fibrotic regions experience reduced myocardial strain, resulting in decreased LA emptying fraction consistent with clinical observations. Both fibrotic tissue stiffening and hypocontractility independently reduce LA function, but, together, these two alterations cause more pronounced effects than either one alone. Fibrosis significantly alters flow patterns throughout the atrial chamber, and particularly, the filling and emptying jets of the left atrial appendage (LAA). The effects of fibrosis in LA flow are largely captured by the concomitant changes in LA emptying fraction except inside the LAA, where a multifactorial behaviour is observed. This work illustrates how high-fidelity, multiphysics models can be used to study thrombogenesis mechanisms in patient-specific anatomies, shedding light onto the links between atrial fibrosis and ischaemic stroke. KEY POINTS: Left atrial (LA) fibrosis is associated with arrhythmogenesis and increased risk of ischaemic stroke; its extent and pattern can be quantified on a patient-specific basis using late gadolinium enhancement magnetic resonance imaging. Current stroke risk prediction tools have limited personalization, and their accuracy could be improved by incorporating patient-specific information such as fibrotic maps and haemodynamic patterns. We present the first electro-mechano-fluidic multiphysics computational simulations of LA flow, including fibrosis and anatomies from medical imaging. Mechanical changes in fibrotic tissue impair global LA motion, decreasing LA and left atrial appendage (LAA) emptying fractions, especially in subjects with higher fibrosis burdens. Fibrotic-mediated LA motion impairment alters LA and LAA flow near the endocardium and the whole cavity, ultimately leading to more stagnant blood regions in the LAA.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.