Abstract

To use a modified Turing test to evaluate the quality of contours auto-generated by a deep learning contouring (DLC) algorithm. Ten consecutive head and neck (H&N) patients treated by tomotherapy were selected for evaluation. The dose and number of fractions received by each patient ranged from 60-70Gy in 30-35 fractions. Each patient had two sets of normal structures generated, one clinically used and drawn by human (physician and dosimetrist), and another created by a DLC H&N model trained with contours from a different institution. DLC uses convolutional neural network algorithm and trains with large numbers of datasets to create auto-contouring models. A group of evaluators consisting of two radiation oncologists, who are the primary physicians for H&N cancer, and four dosimetrists assessed the structures. The questions asked for each structure were 1) Which dataset do you think is generated by the DLC model? (i.e. a Turing Test) 2) Which dataset do you prefer or are they equivalent? 3) Would you suggest additional editing or is it clinically acceptable and if editing is needed, by what approximate percentage? To minimize the bias, no evaluator was requested to score their own patient. Datasets were de-identified and randomized so evaluators couldn’t tell by other factors such as the structure set name, organ name or color to guess which dataset was created by DLC. Nine normal structures were evaluated including parotids, mandible, spinal cord, oral cavity, pharyngeal constrictors, spinal cord, brain stem, esophagus, and larynx. All contours were evaluated in 3D concurrently. The DLC generated structure sets were identified with 90% accuracy by the blinded reviewers, potentially as a result of a noticeable error in at least one contour. However, as shown in Table1, DLC was most similar to human generated contours for parotids, mandible, pharyngeal constrictors, and spinal cord where more than 50% of DLC created contours were considered equal or better than human drawn ones. More than 80% of DLC generated oral cavity and brain stem needed only minor edits. Esophagus and larynx needed modifications though more than 60% were considered to be minor edits. In contrast with previous generations of atlas-based or active shape model approaches, DLC models are capable of producing contours with a high level of clinical acceptance and show promise to be indistinguishable from human generated ones.Abstract 2307; Table 1Quality of DLC generated contours. Equivalent/Better: equivalent or better than human contours; Acceptable: clinically acceptable with no editing needed; Minor: 10% or less editing; Medium: 25% or less editing; Major: more than 25% editing.Equivalent/BetterAcceptableMinorMediumMajorParotid Left56%7%37%Parotid Right58%13%29%Mandible50%3%40%7%Oral Cavity44%0%40%16%Pharyngeal Constrictor63%0%10%27%Spinal Cord63%23%13%Brainstem12%4%65%19%Esophagus27%4%31%12%27%Larynx27%4%35%12%23% Open table in a new tab

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.