Objective. This work aims to develop an automated segmentation method for the prostate and its surrounding organs-at-risk in pelvic computed tomography to facilitate prostate radiation treatment planning. Approach. In this work, we propose a novel deep learning algorithm combining a U-shaped convolutional neural network (CNN) and vision transformer (VIT) for multi-organ (i.e. bladder, prostate, rectum, left and right femoral heads) segmentation in male pelvic CT images. The U-shaped model consists of three components: a CNN-based encoder for local feature extraction, a token-based VIT for capturing global dependencies from the CNN features, and a CNN-based decoder for predicting the segmentation outcome from the VIT’s output. The novelty of our network is a token-based multi-head self-attention mechanism used in the transformer, which encourages long-range dependencies and forwards informative high-resolution feature maps from the encoder to the decoder. In addition, a knowledge distillation strategy is deployed to further enhance the learning capability of the proposed network. Main results. We evaluated the network using: (1) a dataset collected from 94 patients with prostate cancer; (2) and a public dataset CT-ORG. A quantitative evaluation of the proposed network’s performance was performed on each organ based on (1) volume similarity between the segmented contours and ground truth using Dice score, segmentation sensitivity, and precision, (2) surface similarity evaluated by Hausdorff distance (HD), mean surface distance (MSD) and residual mean square distance (RMS), (3) and percentage volume difference (PVD). The performance was then compared against other state-of-art methods. Average volume similarity measures obtained by the network overall organs were Dice score = 0.91, sensitivity = 0.90, precision = 0.92, average surface similarities were HD = 3.78 mm, MSD = 1.24 mm, RMS = 2.03 mm; average percentage volume difference was PVD = 9.9% on the first dataset. The network also obtained Dice score = 0.93, sensitivity = 0.93, precision = 0.93, average surface similarities were HD = 5.82 mm, MSD = 1.16 mm, RMS = 1.24 mm; average percentage volume difference was PVD = 6.6% on the CT-ORG dataset. Significance. In summary, we propose a token-based transformer network with knowledge distillation for multi-organ segmentation using CT images. This method provides accurate and reliable segmentation results for each organ using CT imaging, facilitating the prostate radiation clinical workflow.
Read full abstract