Dental panoramic X-ray imaging, due to its high cost-effectiveness and low radiation dose, has become a widely used diagnostic tool in dentistry. Accurate tooth segmentation is crucial for lesion analysis and treatment planning, helping dentists to quickly and precisely assess the condition of teeth. However, dental X-ray images often suffer from noise, low contrast, and overlapping anatomical structures, coupled with limited available datasets, leading traditional deep learning models to experience overfitting, which affects generalization ability. In addition, high-precision deep models typically require significant computational resources for inference, making deployment in real-world applications challenging. To address these challenges, this paper proposes a tooth segmentation method based on the pre-trained SAM2 model. We employ adapter modules to fine-tune the SAM2 model and introduce ScConv modules and gated attention mechanisms to enhance the model’s semantic understanding and multi-scale feature extraction capabilities for medical images. In terms of efficiency, we utilize knowledge distillation, using the fine-tuned SAM2 model as the teacher model for distilling knowledge to a smaller model named LightUNet. Experimental results on the UFBA-UESC dataset show that, in terms of performance, our model significantly outperforms the traditional UNet model in multiple metrics such as IoU, effectively improving segmentation accuracy and model robustness, particularly with limited sample datasets. In terms of efficiency, LightUNet achieves comparable performance to UNet, but with only 1.6% of its parameters and 24.0% of the inference time, demonstrating its feasibility for deployment on edge devices.
Read full abstract