Novel Deep Learning Segmentation Models for Accurate GTV and OAR Segmentation in MR-Guided Adaptive Radiotherapy for Pancreatic Cancer Patients.

W Choi,Y Vinogradskiy,H Nourzadeh,A Mueller,V Desai,A Kubli,M Werner-Wasik,Y Chen,C Ainsley,K Mooney

doi:10.1016/j.ijrobp.2023.06.1660

Abstract

MR-guided adaptive radiotherapy (MRgART) improves target coverage and organ-at-risk (OAR) sparing in pancreatic cancer radiation therapy (RT). Inter-fractional changes in patients undergoing RT require time intensive re-delineation of gross tumor volume (GTV) and OARs prior to adaptive optimization. Accurate automatic segmentation has the potential to significantly improve efficiency of the adaptive workflow. We hypothesized that state-of-the-art deep learning (DL) segmentation models could adequately segment GTV and OARs in both planning and daily fractional MR scans. The study included 21 patients with pancreatic cancer treated with MRgART (10 Gy x 5 fractions). The planning MR as well as all daily MR images and registrations were collected (6 image sets per patient and a total of 126 image sets). The planning MR and fraction 1-4 image sets were used as the training set (N = 105), while the test set (N = 21) comprised images for fraction 5, to simulate the last step of incremental learning from planning to final fraction. Evaluated contours included the GTV, Small Bowel, Large Bowel, Duodenum, Left and Right Kidney, Liver, Spinal Cord, and Stomach. To mimic clinical conditions, contour accuracy was evaluated within the ring structure surrounding the PTV, inside of which daily adaptive re-contouring is applied (2 cm expansion in the cradio-caudal direction, 3 cm expansion otherwise). We evaluated three DL model architectures: SegResNet, SegResNet 2D, and SwinUNETR to autosegment GTV and OARs. The segmentation models were trained on the training set using 5-fold cross-validation (CV) and quantitatively analyzed by comparing against clinically used contours with DICE scores. Qualitative analysis was performed by a radiation oncologist using a scoring scale: 1 = perfect, 2 = minor discrepancy, 3 = moderate discrepancy, and 4 = rejected. Overall, the DL segmentations were in acceptable agreement with clinical contours. The best performing model was the SwinUNETR model with overall training DICE = 0.88±0.06, test DICE = 0.78±0.11, and qualitative score of 1.6±0.8. The agreement between the DL model and clinical segmentation for the GTV was 0.79±0.08, with a qualitative score of 2.2±0.9. The highest and lowest OAR DICE scores were for the Left Kidney (DICE = 0.93) and Small Bowel (DICE = 0.68), respectively. The highest qualitative OAR scores were for the Kidney, Liver, and Spinal Cord (score = 1.0) and the lowest qualitative score was for the Duodenum (score = 2.3) CONCLUSION: We report here the most comprehensive work on DL segmentation for pancreatic cancer MRgART, including quantitative and clinically-pertinent qualitative evaluations of 126 image sets and 3 DL architectures. Our data show good quantitative agreement between DL and clinical contours, and acceptable clinician evaluations for the majority of GTVs and OARs. The current work has great potential to significantly reduce a major bottleneck in the MRgART workflow for pancreatic cancer patients.

Full Text