Abstract

<h3>Purpose/Objective(s)</h3> Magnetic resonance imaging guided radiotherapy (MRgRT) offers the ability of daily treatment adaptation: a game changer for various cancers. Contouring of organs at risk (OAR) during adaptation is time-consuming and lacks reproducibility across physicians, hampering the accuracy of high precision MRgRT and diminishing its adoption potential. Artificial intelligence (AI) can accelerate and homogenize OAR delineation. This study aims at (i) assessing the reproducibility of clinicians OAR delineation, (ii) comparing the precision between clinical experts (CEs) and AI based contours (AC) and (iii) evaluating the clinical benefit of AI tools for treatment standardization. <h3>Materials/Methods</h3> For the case of low field abdominal MR-based daily treatment adaptation, transfer learning was applied on a CE/FDA-cleared deep learning solution. Models were re-trained using 270 retrospectively selected annotated fractions samples treated with a MR-LINAC at two European cancer care excellence centers. Validation was performed using 2 cohorts of (i) 15 double-blindly annotated patients and (ii) a random 50/50 mix of 30 CEs and AI based annotations. Contours of 8 OARs (right/left kidneys, stomach, liver, duodenum, inferior vena cava, bowel and, abdominal aorta) were scored by 3 CEs as A/ acceptable, B/ acceptable after minor corrections, and C/not acceptable. <h3>Results</h3> The average interobserver variability among the 8 OARs in terms of DICE score coefficient (DSC) was 84.38% with the highest and lowest scores being observed for stomach (95%) and bowel (68%), respectively. The average DSC between CEs and AI annotations was 85.88% with the left/right kidneys (94%) and the duodenum/vena cava (76%) depicting the highest and lowest values, respectively. CE and AI produced annotations scored as A for 89.36% and 71.89% and were considered acceptable (A+B) for 100% and 92.49% of the cases, respectively. AI solutions seem to suffer in organs whit significant discrepancies across CEs for top and the bottom slices. <h3>Conclusion</h3> The results show that AI-driven contours are clinically useable in most cases. Disagreement between experts reflect the subjectivity of scoring. Objective metrics should be used in complement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call