Manual delineation of organs at risk (OAR) is a time-consuming step in the radiation therapy treatment planning workflow. Automatic contouring is used in some clinics to streamline this process. Deformable atlas algorithms are presently the most prevalent method of automatic contouring, but the results often require significant revision. We aimed to show that, for head and neck cancer patients, deep learning methods can train a convolutional neural network to generate more accurate contours when compared to manually drawn expert contours than atlas methods, reducing the manual correction required. A total of 143 cases of head and neck CT scans and manually drawn expert contours from a single institution’s database were utilized in this study: 128 cases to train a CNN based on a modified U-Net architecture, and 15 cases for validation. Data augmentation and dropout layers were used to prevent overfitting. The model trained over 200 epochs for the following OARs: brainstem, cochlea, and the submandibular and parotid glands. We then used the trained model to generate contour data for these OARs on the 15 test cases. All generated contours were written to DICOM format. A commercial atlas-based program, which has been used clinically at the same institution, was also used to generate contour sets for the test cases. Both the atlas-based contours and deep learning-based contours (DLBC) were measured against the manually drawn expert contours as the “ground truth”. The accuracy performance was validated using three metrics: Dice Similarity Coefficient (DSC), which measures area overlap, Mean Surface Distance (MSD), and 95% Hausdorff Distance (HD), defined as the separation distance from ground truth that encompasses 95% of contour points in the automatic contour. For both the parotid and submandibular glands, the DLBC were more closely aligned with the ground truth contours than the atlas contours for all evaluation metrics (p-value < 0.005). For the parotid gland, the mean DLBC improvement over atlas for DSC, MSD, and HD was 17% ± 1.9%, 44% ± 13%, and 37% ± 15%, respectively. For the submandibular, the mean improvement was 58% ± 15%, 54% ± 19%, and 45% ± 15%, respectively. For the brainstem, the DLBC scored comparably to the atlas contours on all three metrics with no significant improvement. For the cochlea, the DSC scores for both methods were comparable, while the MSD and HD scores of the DLBC were superior to the atlas contours (p-value < 0.015), with mean improvement of 38% ± 13% and 33% ± 10% respectively. The measurably more accurate contours generated for the parotid gland, submandibular gland, and cochlea by the neural network would require less correction by a reviewer than those generated by the atlas method used in the clinic. The contours generated by the neural network for the brainstem were comparable. Expanding the neural network to define every OAR could offer a superior method of automatic contour generation for use in the clinic.
Read full abstract