Abstract

BackgroundThe quantitative measures used to assess the performance of automated methods often do not reflect the clinical acceptability of contouring. A quality-based assessment of automated cardiac magnetic resonance (CMR) segmentation more relevant to clinical practice is therefore needed.ObjectiveWe propose a new method for assessing the quality of machine learning (ML) outputs. We evaluate the clinical utility of the proposed method as it is employed to systematically analyse the quality of an automated contouring algorithm.MethodsA dataset of short-axis (SAX) cine CMR images from a clinically heterogeneous population (n = 217) were manually contoured by a team of experienced investigators. On the same images we derived automated contours using a ML algorithm. A contour quality scoring application randomly presented manual and automated contours to four blinded clinicians, who were asked to assign a quality score from a predefined rubric. Firstly, we analyzed the distribution of quality scores between the two contouring methods across all clinicians. Secondly, we analyzed the interobserver reliability between the raters. Finally, we examined whether there was a variation in scores based on the type of contour, SAX slice level, and underlying disease.ResultsThe overall distribution of scores between the two methods was significantly different, with automated contours scoring better than the manual (OR (95% CI) = 1.17 (1.07–1.28), p = 0.001; n = 9401). There was substantial scoring agreement between raters for each contouring method independently, albeit it was significantly better for automated segmentation (automated: AC2 = 0.940, 95% CI, 0.937–0.943 vs manual: AC2 = 0.934, 95% CI, 0.931–0.937; p = 0.006). Next, the analysis of quality scores based on different factors was performed. Our approach helped identify trends patterns of lower segmentation quality as observed for left ventricle epicardial and basal contours with both methods. Similarly, significant differences in quality between the two methods were also found in dilated cardiomyopathy and hypertension.ConclusionsOur results confirm the ability of our systematic scoring analysis to determine the clinical acceptability of automated contours. This approach focused on the contours' clinical utility could ultimately improve clinicians' confidence in artificial intelligence and its acceptability in the clinical workflow.

Highlights

  • Cardiac magnetic resonance (CMR) is the gold standard noninvasive imaging modality for accurate quantification of cardiac chamber volume, myocardial mass and function [1]

  • We evaluate the effectiveness of the Abbreviations: AC2, second-order agreement coefficient; CI, confidence interval; CMR, cardiovascular magnetic resonance; CNN, convolutional neural network; CVDs, cardiovascular diseases; DCM, dilated cardiomyopathy; DCS, dice similarity coefficient; ED, end-diastole; ES, end-systole; GUI, graphical user interface; Hausdorff distance (HD), hausdorff distance; HTN, hypertension; HCM, hypertrophic cardiomyopathy; IHD, ischaemic heart disease; LV, left ventricle; LVNC, left ventricular non-compaction; ML, machine learning; National Heart Center Singapore (NHCS), national health center singapore; Odds ratios (OR), odds ratios; QC, quality control; RCA, reverse classification accuracy; RV, right ventricle; SAX, short-axis; SOP, standard operating procedures; UK Biobank (UKB), united Kingdom Biobank

  • We evaluated the clinical acceptability of automated contouring by analyzing the degree of agreement between two segmentation methods based on the quality scores

Read more

Summary

Introduction

Cardiac magnetic resonance (CMR) is the gold standard noninvasive imaging modality for accurate quantification of cardiac chamber volume, myocardial mass and function [1]. Automated segmentation based on machine learning (ML) algorithms can reduce the inter- and intra-observer variability and speed up the contouring process [7]. These MLbased methods can expedite the extraction of clinically relevant information from larger image datasets. The quantitative measures used to assess the performance of automated methods often do not reflect the clinical acceptability of contouring. A quality-based assessment of automated cardiac magnetic resonance (CMR) segmentation more relevant to clinical practice is needed

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call