Robust Prediction of Isodose Distribution with a Fully Convolutional Networks (FCN)-Based Deep Learning Model

P Dong,L Xing

doi:10.1016/j.ijrobp.2018.06.158

Abstract

Deep learning excels conventional methods in the domain of image recognition, classification and segmentation. It uses many layered convolution neural network to find hierarchies of representations and abstractions of images through pretraining on a large data set. Dose prediction based on the geometry shape and location of the contours, without relying on time-consuming dose calculations and subsequent optimizations can facilitate clinical decision making and improve efficiency by reducing the back and forth between the planner and the physician. Instead of predicting DVHs with machine learning as is conventionally done, here we propose to predict the isodose in voxel domain with the popular Fully Convolution Networks (FCN). We trained the FCN end-to-end, pixels-to-pixels on over 10,000 input images with artificially generated patient contours and corresponding output images with dose distributions, in a resolution of 512x512. We generate the BODY, PTV and OAR contours randomly to increase the variety of the training set, thus testing the robustness of the method. The location, size and shape of the contours are different from case to case and the body sizes vary from 12 cm to 20 cm. We calculate the dose distribution with a simplified pencil beam algorithm for a generic 6x beam and treating every pixel inside BODY as water. For each case, we optimized a 3D conformal plan by balancing the mean dose to the PTV and the OAR. The generated dose distribution is then converted to 10 isodose lines from 100% to 10% which forms the output image. The FCN, as a popular deep learning structure for segmentation, adapted the state of the art recognition network VGG16 that is pretrained on the ImageNet. The first 5 layers of the VGG16 consist of convolution layer and pooling layer that generate 1d vector representing the image classes. FCN replaced the last three full-connected layers of the VGG16 with deconvolutional layers which up-sample the coarse output to the same resolution as the original input. The network was trained by Stochastic Gradient Descent with momentum 0.9, fixed learning rate of 10ˆ-4, weight decay of 2ˆ-4. The training takes around 40 epochs (one pass through all the training set) on a Titan X video card with 12 GB memory. We used the Caffe framework and MatLab for our deep learning implementation. Mean Intersection over union (MIOU) and pixel accuracy (PA) are used to test the accuracy on a validation set that is generated the same way as the training set with 1000 cases. The MIOU is ∼ 63% and the PA is ∼ 91% on the validation set. It takes around one tenth of a second for the dose prediction. We showed the potential of deep learning method in reliably predicting dose distribution from patient geometries alone. The speed and precision of this method would only improve with the fast advances in the deep learning field. The versatility and speed of this method makes it a valuable tool for any TPS and promises to significantly improve the clinical workflow.

Full Text