Abstract

<h3>Purpose/Objective(s)</h3> Contouring Organs-at-risk (OAR) is a laborious process that often delays radiation treatment plan design. A few FDA approved auto-segmentation software (AS) have become available. Our goal is to validate such a commercial AS in thoracic cancer OAR contouring. <h3>Materials/Methods</h3> We installed an externally trained AS into our AI computer. Validation is judged by our current gold standard contouring (GSC) by two experienced planners and one radiation oncologist (RO). We used 30 lung or esophageal cancer planning datasets to generate GSC and AI contours (AIC). Objective analysis included Dice Similarity Coefficient (DSC) and 95% Hausdorff distance (95% HD). Subjective analysis was done by two ROs to score 1 to 3 on all OARs by GSC and AIC that were randomly blended and anonymized with consistent nomenclature (1: no modification required; 2: minor modification required but adequate for clinical use; 3: major modification required and not suitable for clinical use). <h3>Results</h3> Most retrospective peer-reviewed OAR contours neglected some less important structures on CT slices typically far away from the target of the 30 patients, median age 75 years (54-90), including 22 males and 8 females, with 28 average pixel density data-sets from 4D-CT for lung cancer and 2 fast helical scans for esophageal cancer. We had to re-contour most of the OARs to generate GSC. The median GSC and AIC contouring times were 60 vs 2.5 minutes for up to 12 OARs, some of which were only partially available in the datasets (e.g., stomach and liver). Due to the inconsistency of contouring organs far away from the planning target volume, we only chose six main OARs for initial validation and analysis. Comparing AICs to GSCs, the mean DSC and 95% HD were: esophagus 0.61 and 16 mm, heart 0.85 and 13.1 mm, left lung 0.97 and 5.9 mm, right lung 0.96 and 5.7 mm, spinal cord 0.82 and 10.7 mm, trachea and proximal bronchial tree (TPB) 0.67 and 19.1 mm, respectively. The two ROs agreed with 100% of four OARs on GSC, i.e., both RO scoring 1 or 2 meaning adequate for planning purpose, with the exception of esophagus having 96.7% vs 100% and right lung having 100% vs 96.7% agreement, respectively. They had less agreement on AIC, with esophagus 90% vs 60%, heart 83.3% vs 86.7%, left lung 100% vs 96.7%, right lung 100% vs 96.7%, spinal cord 100% vs 100%, TPB 96.7% vs 86.7% agreement, respectively. The inter-observer variabilities are significantly larger when ROs evaluated esophagus and heart AIC (p=0.046 and 0.05, respectively, Student's t-test). <h3>Conclusion</h3> The accuracy of an externally trained deep learning-based AS might not be acceptable without in-house training from local protocols. Retrospective peer-reviewed OAR contours might not be good enough in the training and evaluation of AS. Our future work involves training AS using our GSCs and re-evaluating its performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.