Development and multi-institutional validation of a convolutional neural network to detect vertebral body mis-alignments in 2D x-ray setup images.

Rachel Petragallo,James M Lamb,Ganesh Narayanasamy,Daniel L Saenz,Daniel A Low,Gilmer Valdes,Olivier Morin,Per Halvorsen,Pascal Bertram,Benjamin P Ziemer,Lauren Weinstein,Ileana Iftimia,Michelle C Wells,Kevinraj N Sukumar

doi:10.1002/mp.16359

Abstract

Misalignment to the incorrect vertebral body remains a rare but serious patient safety risk in image-guided radiotherapy (IGRT). Our group has proposed that an automated image-review algorithm be inserted into the IGRT process as an interlock to detect off-by-one vertebral body errors. This study presents the development and multi-institutional validation of a convolutional neural network (CNN)-based approach for such an algorithm using patient image data from a planar stereoscopic x-ray IGRT system. X-rays and digitally reconstructed radiographs (DRRs) were collected from 429 spine radiotherapy patients (1,592 treatment fractions) treated at six institutions using a stereoscopic x-ray image guidance system. Clinically-applied, physician approved, alignments were used for true-negative, "no-error" cases. "Off-by-one vertebral body" errors were simulated by translating DRRs along the spinal column using a semi-automated method. A leave-one-institution-out approach was used to estimate model accuracy on data from unseen institutions as follows: All of the images from five of the institutions were used to train a CNN model from scratch using a fixed network architecture and hyper-parameters. The size of this training set ranged from 5,700 to 9,372 images, depending on exactly which five institutions were contributing data. The training set was randomized and split using a 75/25 split into the final training/ validation sets. X-ray/ DRR image pairs and the associated binary labels of "no-error" or "shift" were used as the model input. Model accuracy was evaluated using images from the sixth institution, which were left out of the training phase entirely. This test set ranged from 180 to 3,852 images, again depending on which institution had been left out of the training phase. The trained model was used to classify the images from the test set as either "no-error" or "shifted", and the model predictions were compared to the ground truth labels to assess the model accuracy. This process was repeated until each institution's images had been used as the testing dataset. When the six models were used to classify unseen image pairs from the institution left out during training, the resulting receiver operating characteristic area under the curve values ranged from 0.976 to 0.998. With the specificity fixed at 99%, the corresponding sensitivities ranged from 61.9% to 99.2% (mean: 77.6%). With the specificity fixed at 95%, sensitivities ranged from 85.5% to 99.8% (mean: 92.9%). This study demonstrated the CNN-based vertebral body misalignment model is robust when applied to previously unseen test data from an outside institution, indicating that this proposed additional safeguard against misalignment is feasible. This article is protected by copyright. All rights reserved.

Full Text