Our purpose was to identify variations in the clinical use of automatically generated contours that could be attributed to software error, off-label use, or automation bias. For 500 head and neck patients who were contoured by an in-house automated contouring system, Dice similarity coefficient and added path length were calculated between the contours generated by the automated system and the final contours after editing for clinical use. Statistical process control was used and control charts were generated with control limits at 3 standard deviations. Contours that exceeded the thresholds were investigated to determine the cause. Moving mean control plots were then generated to identify dosimetrists who were editing less over time, which could be indicative of automation bias. Major contouring edits were flagged for: 1.0% brain, 3.1% brain stem, 3.5% left cochlea, 2.9% right cochlea, 4.8% esophagus, 4.1% left eye, 4.0% right eye, 2.2% left lens, 4.9% right lens, 2.5% mandible, 11% left optic nerve, 6.1% right optic nerve, 3.8% left parotid, 5.9% right parotid, and 3.0% of spinal cord contours. Identified causes of editing included unexpected patient positioning, deviation from standard clinical practice, and disagreement between dosimetrist preference and automated contouring style. A statistically significant (P < .05) difference was identified between the contour editing practice of dosimetrists, with 1 dosimetrist editing more across all organs at risk. Eighteen percent (27/150) of moving mean control plots created for 5 dosimetrists indicated the amount of contour editing was decreasing over time, possibly corresponding to automation bias. The developed system was used to detect statistically significant edits caused by software error, unexpected clinical use, and automation bias. The increased ability to detect systematic errors that occur when editing automatically generated contours will improve the safety of the automatic treatment planning workflow.
Read full abstract