Background: The annotation of the regions of interest (ROI) of lumbar vertebrae by radiologists for bone density assessment is a tedious and time-intensive task. However, deep learning (DL) methods for image segmentation has the potential to substitute manual annotations which can significantly improve the efficiency of clinical diagnostics. Objective: The paper proposes a semi-supervised three-dimensional (3D) segmentation method for the ROI of lumbar vertebrae by integrating the tube masking masked autoencoder (MAE) pre-training. Methods: The paper proposes a method that modifies the masking strategy of the original MAE pre-training network. And the pre-training network is only trained by images without segmentation labels, when the training is finished, the weights will be saved for segmentation tasks. In downstream tasks, a semi-supervised approach utilizing pseudo-label generation is employed for training. This method leverages a small amount of labeled data to achieve the segmentation of ROI of the lumbar vertebrae. Results: The experimental results demonstrate that under the condition of limited annotated data, the proposed network improves the dice coefficient by 5–7% and reduces the hausdorff distance by 0.2∼0.6 mm compared to using the UNetr network alone for segmentation. When compared to the conventional MAE, the tube masking MAE presented in this paper assists effectively in segmentation, resulting in a 2% increase in the dice coefficient and a 0.24 mm reduction in the hausdorff distance. Conclusion: Automatic segmentation of the ROI of the lumbar vertebrae helps to shorten the time for doctors to annotate vertebrae during clinical bone density examinations. The paper employs the tube masking MAE pre-trained model to effectively extract contextual information of the 3D lumbar vertebrae, combining it with a semi-supervised network leveraging pseudo-label generation for fine-tuning, which leads to effective 3D segmentation of the lumbar vertebrae.
Read full abstract