Clinical Usability-Oriented Automatic Contour Quality Evaluation for Deep Learning Auto-Segmentation

Y Zhang,A Amjad,J Ding,C Sarosiek,M Zarenia,R Conlin,N.P Dang,W.A Hall,B.A Erickson,E.S Paulson,A Li

doi:10.1016/j.ijrobp.2023.06.559

Abstract

Various auto-segmentations, including deep learning auto-segmentation (DLAS), are being increasingly adopted in radiotherapy. A common method to evaluate quality of auto-segmented contours uses thresholds of various quantitative metrics (e.g., dice similarity coefficient (DSC), mean distance to agreement (MDA), etc.) that are often averaged over all contour slices. This method fails to detect contour errors on individual slices, thus, does not reflect the current clinical practice (slice-by-slice evaluation) and the clinical usability (e.g., expected contour editing time). In addition, the use of multi-metrics is generally not easy to interpret. This work aims to develop a novel contour quality classification (CQC) model to evaluate auto-segmented contours based on their clinical applicability. The CQC method was designed to classify a contour on a slice into acceptable, minor edit or major edit category, based on the expected editing effort/time. Organ-specific supervised ensemble tree classification models were trained to relate the slice-based quality category with the combination of seven commonly used calculatable quantitative metrics (i.e., DSC, MDA, Hausdorff 95% distance, surface DSC, added path length (APL), slice area and relative APL). The proposed method was demonstrated by training CQC models using DLAS contours of five abdominal organs (i.e., pancreas, duodenum, stomach, and small and large bowels) from 50 MRI sets and evaluating on 20 MRI and 9 CT testing sets. These test datasets were labelled by six individual observers and the consensus labels were generated through majority vote method. The model performance was evaluated using accuracy (acc), and risk rate (RR, the percentage of unacceptable slices mislabeled as acceptable) and compared with inter-observer variation and baseline threshold-based method. Compared to the majority vote labels, the obtained CQC models achieved a mean accuracy of 95.8% ([94.5%-99.1%]) and 94.3% ([90.6%-96.9%]), and the mean RR of 0.8% ([0.3%-1.3%]) and 0.7% ([0%-1.1%]) for the MRI and CT testing sets, respectively. The CQC performance was comparable to the inter-observer variation and significantly higher than those from the threshold-based method with single or multiple metrics. The execution time on a typical abdominal dataset (e.g., 70 slices) took less than 3 seconds. Table 1 CQC models performance for different organs CONCLUSION: The proposed CQC model can classify the quality of a contour slice with high accuracy. This slice-based single-output evaluation method better reflects the current clinical practice and may be used to evaluate/compare performance of DLAS on any image modality, facilitating its clinical implementation and quality assurance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Clinical Usability-Oriented Automatic Contour Quality Evaluation for Deep Learning Auto-Segmentation

Abstract

Talk to us

Similar Papers

More From: International Journal of Radiation OncologyBiologyPhysics

Lead the way for us

Similar Papers

An evaluation of MR based deep learning auto-contouring for planning head and neck radiotherapy
C Hague ... R Chuter
Radiotherapy and Oncology | VOL. 158
C Hague, et. al.C Hague ... R Chuter
24 Feb 2021
Radiotherapy and Oncology | VOL. 158

Automatic Contour Refinement for Deep Learning Auto-segmentation of Complex Organs in MRI-guided Adaptive Radiation Therapy
Jie Ding ... X Allen Li
Advances in Radiation Oncology | VOL. 7
Jie Ding, et. al.Jie Ding ... X Allen Li
20 Apr 2022
Advances in Radiation Oncology | VOL. 7

Prostate cancer GTV delineation with biparametric MRI and 68Ga-PSMA-PET: comparison of expert contours and semi-automated methods.
Nathan Hearn ... Philip Vivian
The British Journal of Radiology | VOL. 94
Nathan Hearn, et. al.Nathan Hearn ... Philip Vivian
28 Jan 2021
The British Journal of Radiology | VOL. 94

Development of an automated radiotherapy dose accumulation workflow for locally advanced high-risk prostate cancer - A technical report.
Ashley Ong ... Jeffrey Kit Loong Tuan
Journal of medical radiation sciences | VOL. 68
Ashley Ong, et. al.Ashley Ong ... Jeffrey Kit Loong Tuan
15 Oct 2020
Journal of medical radiation sciences | VOL. 68

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clinical Usability-Oriented Automatic Contour Quality Evaluation for Deep Learning Auto-Segmentation

Abstract

Talk to us

Similar Papers

More From: International Journal of Radiation Oncology*Biology*Physics

More From: International Journal of Radiation OncologyBiologyPhysics