Abstract

Total marrow irradiation (TMI) using intensity modulated radiation therapy requires time-intensive contouring of many organs-at-risk (OARs) throughout the entire body. This study evaluated the quality of contours auto-generated by an artificial intelligence (AI) contouring algorithm for OAR volumes. Mathematical metrics were evaluated against human reviewers.The first 10 consecutive TMI patients in a phase II relapsed/refractory acute leukemia trial were selected for evaluation. Dose prescriptions were 20 Gy to bone/lymph nodes/spleen and 12 Gy to liver/brain delivered over 5 days twice daily. Each patient required contouring for 30 structures which took approximately 8 hours of dosimetrist time. A convolutional neural network model was developed to auto-segment OARs. Twenty-six common OARs were modeled including eyes, lens, optical nerves and chiasm, parotids, mandible, oral cavity, thyroid, larynx, esophagus, lungs, heart, spinal cord, kidneys, liver, spleen, stomach, bladder, rectum and bowel. A clinical validation score (CVS) from 1 to 10 was given to every AI-based OAR jointly by an experienced dosimetrist and physicist. A score of 10 represented OARs that needed no edits while 1 represented totally incorrect contours; scores in between were based on the reviewers' assessment of effort required to edit contours. A spatial similarity metrics (Dice coefficient (DC)) and a surface distance metrics (95% Hausdorff distance (HD)) were calculated using clinical contours as the ground truth to correlate with CVS.Our AI model offered a clinically useable auto-segmentation solution to delineate structures from CT images and reduced dosimetrist contouring time for TMI patients from 8 hours to 3 hours on average. 16 out of 26 OARs only required minor editing, with CVS scores above 7. The optic chiasm and esophagus scored below 3, which was considered the threshold of zero value. Eyes, optic nerves, oral cavity, stomach, bladder and rectum were the remaining OARs scored between 4 and 6, corresponding to usable with significant revisions. DC and HD did not reliably assess the quality of auto-contours. Between DC and CVS, only 4 OARs had Pearson correlation coefficient greater than 0.75 (average 0.35, range -0.29 to 0.93). Between HD and CVS, the Spearman correlation coefficient ranged from -0.92 to 0.36 (average -0.17).AI-based auto-generated contours showed promise to replace human contouring for a variety of OARs for TMI and other treatment planning workflows. This AI model has already been adopted in our routine clinical practice and could facilitate adoption of TMI at other centers due to significantly reduced contouring time. Conventional statistics including DC and HD did not effectively gauge the quality of auto-contours when compared to human assessment of quality. New metrics are therefore needed to assess OAR contour quality.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call