LumVertCancNet: A novel 3D lumbar vertebral body cancellous bone location and segmentation method based on hybrid Swin-transformer

Yingdi Zhang,Zelin Shi,Huan Wang,Shaoqian Cui,Lei Zhang,Jiachen Liu,Xiuqi Shan,Yunpeng Liu,Lei Fang

doi:10.1016/j.compbiomed.2024.108237

Abstract

Lumbar vertebral body cancellous bone location and segmentation is crucial in an automated lumbar spine processing pipeline. Accurate and reliable analysis of lumbar spine image is expected to advantage practical medical diagnosis and population-based analysis of bone strength. However, the design of automated algorithms for lumbar spine processing is demanding due to significant anatomical variations and scarcity of publicly available data. In recent years, convolutional neural network (CNN) and vision transformers (Vits) have been the de facto standard in medical image segmentation. Although adept at capturing global features, the inherent bias of locality and weight sharing of CNN constrains its capacity to model long-range dependency. In contrast, Vits excel at long-range dependency modeling, but they may not generalize well with limited datasets due to the lack of inductive biases inherent to CNN. In this paper, we propose a deep learning-based two-stage coarse-to-fine solution to address the problem of automatic location and segmentation of lumbar vertebral body cancellous bone. Specifically, in the first stage, a Swin-transformer based model is applied to predict the heatmap of lumbar vertebral body centroids. Considering the characteristic anatomical structure of lumbar spine, we propose a novel loss function called LumAnatomy loss, which enforces the order and bend of the predicted vertebral body centroids. To inherit the excellence of CNN and Vits while preventing their respective limitations, in the second stage, we propose an encoder–decoder network to segment the identified lumbar vertebral body cancellous bone, which consists of two parallel encoders, i.e., a Swin-transformer encoder and a CNN encoder. To enhance the combination of CNNs and Vits, we propose a novel multi-scale attention feature fusion module (MSA-FFM), which address issues that arise when fusing features given at different encoders. To tackle the issue of lack of data, we raise the first large-scale lumbar vertebral body cancellous bone segmentation dataset called LumVBCanSeg containing a total of 185 CT scans annotated at voxel level by 3 physicians. Extensive experimental results on the LumVBCanSeg dataset demonstrate the proposed algorithm outperform other state-of-the-art medical image segmentation methods. The data is publicly available at: https://zenodo.org/record/8181250. The implementation of the proposed method is available at: https://github.com/sia405yd/LumVertCancNet.

Full Text