Adaptive radiotherapy (ART) can compensate for the dosimetric impact of anatomic change during radiotherapy of head-neck cancer (HNC) patients. However, implementing ART universally poses challenges in clinical workflow and resource allocation, given the variability in patient response and the constraints of available resources. Therefore, the prediction of anatomical change during radiotherapy for HNC patients is of importance to optimize patient clinical benefit and treatment resources. Current studies focus on developing binary ART eligibility classification models to identify patients who would experience significant anatomical change, but these models lack the ability to present the complex patterns and variations in anatomical changes over time. Vision Transformers (ViTs) represent a recent advancement in neural network architectures, utilizing self-attention mechanisms to process image data. Unlike traditional Convolutional Neural Networks (CNNs), ViTs can capture global contextual information more effectively, making them well-suited for image analysis and image generation tasks that involve complex patterns and structures, such as predicting anatomical changes in medical imaging. The purpose of this study is to assess the feasibility of using a ViT-based neural network to predict radiotherapy-induced anatomic change of HNC patients. We retrospectively included 121 HNC patients treated with definitive chemoradiotherapy (CRT) or radiation alone. We collected the planning computed tomography image (pCT), planned dose, cone beam computed tomography images (CBCTs) acquired at the initial treatment (CBCT01) and Fraction 21 (CBCT21), and primary tumor volume (GTVp) and involved nodal volume (GTVn) delineated on both pCT and CBCTs of each patient for model construction and evaluation. A UNet-style Swin-Transformer-based ViT network was designed to learn the spatial correspondence and contextual information from embedded image patches of CT, dose, CBCT01, GTVp, and GTVn. The deformation vector field between CBCT01 and CBCT21 was estimated by the model as the prediction of anatomic change, and deformed CBCT01 was used as the prediction of CBCT21. We also generated binary masks of GTVp, GTVn, and patient body for volumetric change evaluation. We used data from 101 patients for training and validation, and the remaining 20 patients for testing. Image and volumetric similarity metrics including mean square error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), Dice coefficient, and average surface distance were used to measure the similarity between the target image and predicted CBCT. Anatomy change prediction performance of the proposed model was compared to a CNN-based prediction model and a traditional ViT-based prediction model. The predicted image from the proposed method yielded the best similarity to the real image (CBCT21) over pCT, CBCT01, and predicted CBCTs from other comparison models. The average MSE, PSNR, and SSIM between the normalized predicted CBCT and CBCT21 are 0.009, 20.266, and 0.933, while the average Dice coefficient between body mask, GTVp mask, and GTVn mask is 0.972, 0.792, and 0.821, respectively. The proposed method showed promising performance for predicting radiotherapy-induced anatomic change, which has the potential to assist in the decision-making of HNC ART.
Read full abstract