Abstract

<h3>Purpose/Objective(s)</h3> Auto-segmentation of organs at risk (OARs) using artificial intelligence (AI) has the potential to improve the quality of radiation oncology contouring while also improving efficiency. OAR segmentation models can be used to standardize across centers and within cooperative group trials and as quality benchmarks. However, variation exists in the delineation and quality of OARs in commercially available products. We evaluated 5 commercially available products to assess whether models could outperform the clinical OAR contours relative to the reference standard. <h3>Materials/Methods</h3> We archived clinically used contours of 35 H&N patients; additionally, disease site experts contoured 42 anatomically correct structures on each scan using international cooperative group and departmental OAR consensus guidelines. The disease site expert contours were considered the reference standard, while the clinically treated contours were used as the benchmark values. We applied the 5 commercially available AI based auto-segmentation tools to the same 35 CT datasets, generating a total of 4395 structures. To compare these structures volumetric and overlap Dice Similarity Coefficient (DSC) were evaluated. Overlap DSC only compares CT slices where contours from both structure sets are present, reducing the impact on DSC values for differing contouring practices for tubular OARs. <h3>Results</h3> Models 1-5 generated a range of 16 to 27 structures. The least commonly delineated structures were the external auditory canals, mastoids, and nasal cavity. For structures contoured by all models, the most variation as measured by volumetric DSC was found in the spinal cord and oral cavity, whereas the least variation was found in the brain. Detailed evaluation of each structure across these platforms has been analyzed and is available for presentation as well as evaluation of modes of failure. Briefly, there were 33 complete contour misses out of 4395 inferences (0.75%). Selecting the 9 OARs delineated by all 5 models, the overall median volumetric DSC values were 0.84, 0.86, 0.80, 0.79, and 0.83, respectively. Median volumetric DSC for all structures in each model were 0.75, 0.74, 0.71, 0.69, and 0.61, with 2 of 5 models performing at or better than clinical contours when compared to the reference standard. <h3>Conclusion</h3> Across 5 commercially available AI segmentation models, 2 of 5 models performed at or above the clinical standard, with significant heterogeneity in the number and definition of OARs. No models performed to expert qualitative review, which was based on international cooperative group consensus guidelines. Available segmentation tools have significant room for improvement and more formal consensus standards of OARs are needed. Saliency models by voxel and second check quality systems can be developed by employing more than 1 model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call