Abstract

Successful delivery of radiotherapy (RT) relies on correct and consistent definition of organs at risk (OAR). Artificial intelligence (AI) auto-segmentation based on deep learning (DL) is emerging as a means to reduce variability of OARs. Knowledge about how auto-segmentation tools perform on standardized datasets is limited. Our aim is to evaluate the quality of rectal volumes defined by an existing DL auto-contouring algorithm on a standardized dataset for rectal volumes in prostate cancer RT as proposed by the Swedish STRONG guidelines for male pelvis [1][2][3]. [1] Olsson et al. PhIRO 2019:11:88 [2] Gay et al. IJROBP 2012:83:e353 [3] Salembier et al. RO 2018:127:49 MATERIALS/METHODS: DICOM-RT datasets for 19 patients extracted from a treatment planning system at one institution in Sweden was used. The patients were treated for prostate cancer in 2018-2019 using 15 MV photon beams to total doses of 50-70Gy@2-3Gy/fraction. Rectal volumes were defined on planning CT images clinically (clinical), according to proposed standard (reference), and auto-contoured using the existing version of the MVision software. The degree of variation between volumes was investigated. Statistics included dose volume histogram metrics, the Jaccard index, and the Dice similarity coefficient with comparisons between groups based on hypothesis testing with a two-sided P-value≤0.05 indicating a statistically significant difference.The clinical and the MVision DVH points were in general closer to each other than to the points of the reference DVH. Clinical and MVision volumes were also somewhat larger than the reference volume (83cc and 84cc versus 80cc; P > > 0.05). Overall, mean doses were 0.8/0.9Gy lower for clinical/MVision compared with the reference (21.9 ± 5.1/21.8 ± 5.1Gy versus 22.7 ± 5.9Gy; P > > 0.05) whilst maximum doses were similar for all (67.8-67.9 ± 4.3). Both overlap metrics were similar for volume comparisons between MVision and reference and for MVision and clinical but somewhat higher for clinical and reference (Jaccard index/Dice coefficient: 0.74/0.85 versus 0.73/0.83 versus 0.90/0.95, respectively). Differences were primarily found in cranio-caudal direction between any two data sets.This is the first systematic comparison of an auto-contouring algorithm's performance on a standardized OAR dataset in RT. Our results need to be confirmed on a larger patient cohort, but indicate that the MVision algorithm did very well since the size of identified differences generally was small both with respect to clinical and to reference volumes. However, to increase the use of AI-algorithms for clinical studies and research, retraining them on standardized reference datasets has the potential to improve performance further.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.