Patient specific prior cross attention for kV decomposition in paraspinal motion tracking.

Xiuxiu He,Weixing Cai,Laura I Cerviño,Xiang Li,Jean M Moran,John J Cuaron,Qiyong Fan,Pengpeng Zhang,Feifei Li,Tianfang Li

doi:10.1002/mp.16644

Abstract

X-ray image quality is critical for accurate intrafraction motion tracking in radiation therapy. This study aims to develop a deep-learning algorithm to improve kV image contrast by decomposing the image into bony and soft tissue components. In particular, we designed a priori attention mechanism in the neural network framework for optimal decomposition. We show that a patient-specific prior cross-attention (PCAT) mechanism can boost the performance of kV image decomposition. We demonstrate its use in paraspinal SBRT motion tracking with online kV imaging. Online 2D kV projections were acquired during paraspinal SBRT for patient motion monitoring. The patient-specific prior images were generated by randomly shifting and rotating spine-only DRR created from the setup CBCT, simulating potential motions. The latent features of the prior images were incorporated into the PCAT using multi-head cross attention. The neural network aimed to learn to selectively amplify the transmission of the projection image features that correlate with features of the priori. The PCAT network structure consisted of (1) a dual-branch generator that separates the spine and soft tissue component of the kV projection image and (2) a dual-function discriminator (DFD) that provides the realness score of the predicted images. For supervision, we used a loss combining mean absolute error loss, discriminator loss, perceptual loss, total variation, and mean squared error loss for soft tissues. The proposed PCAT approach was benchmarked against previous work using the ResNet generative adversarial network (ResNetGAN) without prior information. The trained PCAT had improved performance in effectively retaining and preserving the spine structure and texture information while suppressing the soft tissues from the kV projection images. The decomposed spine-only x-ray images had the submillimeter matching accuracy at all beam angles. The decomposed spine-only x-ray significantly reduced the maximum errors to 0.44mm(<2 pixels) in comparison to 0.92mm (∼4 pixels) of ResNetGAN. The PCAT decomposed spine images also had higher PSNR and SSIM (p-value<0.001). The PCAT selectively learned the important latent features by incorporating the patient-specific prior knowledge into the deep learning algorithm, significantly improving the robustness of the kV projection image decomposition, and leading to improved motion tracking accuracy in paraspinal SBRT.

Full Text