VerteFormer: A single-staged Transformer network for vertebrae segmentation from CT images with arbitrary field of views.

Xin You,Steve Lu,Yun Gu,Yingying Liu,Jie Yang,Xin Tang

doi:10.1002/mp.16467

Abstract

Spinal diseases are burdening an increasing number of patients. And fully automatic vertebrae segmentation for CT images with arbitrary field of views (FOVs), has been a fundamental research for computer-assisted spinal disease diagnosis and surgical intervention. Therefore, researchers aim to solve this challenging task in the past years. This task suffers from challenges including the intra-vertebrae inconsistency of segmentation and the poor identification of biterminal vertebrae in CT scans. And there are some limitations in existing models, which might be difficult to be applied to spinal cases with arbitrary FOVs or employ multi-stage networks with too much computational cost. In this paper, we propose a single-staged model called VerteFormer which can effectively deal with the challenges and limitations mentioned above. The proposed VerteFormer utilizes the advantage of Vision Transformer (ViT), which does well in mining global relations for input data. The Transformer and UNet-based structure effectively fuse global and local features of vertebrae. Beisdes, we propose the Edge Detection (ED) block based on convolution and self-attention to divide neighboring vertebrae with clear boundary lines. And it simultaneously promotes the network to achieve more consistent segmentation masks of vertebrae. To better identify the labels of vertebrae in the spine, particularly biterminal vertebrae, we further introduce global information generated from the Global Information Extraction (GIE) block. We evaluate the proposed model on two public datasets: MICCAI Challenge VerSe 2019 and 2020. And VerteFormer achieve 86.39% and 86.54% of dice scores on the public and hidden test datasets of VerSe 2019, 84.53% and 86.86% of dice scores on VerSe 2020, which outperforms other Transformer-based models and single-staged methods specifically designed for the VerSe Challenge. Additional ablation experiments validate the effectiveness of ViT block, ED block and GIEblock. We propose a single-staged Transformer-based model for the task of fully automatic vertebrae segmentation from CT images with arbitrary FOVs. ViT demonstrates its effectiveness in modeling long-term relations. The ED block and GIE block has shown their improvements to the segmentation performance of vertebrae. The proposed model can assist physicians for spinal diseases' diagnosis and surgical intervention, and is also promising to be generalized and transferred to other applications of medicalimaging.

Full Text