Tooth instance segmentation of dental panoramic X-ray images is of significant clinical importance. Teeth exhibit symmetry within the upper and lower jawbones and are arranged in a specific order. However, previous studies frequently overlook this crucial spatial prior information, resulting in the misidentifications of tooth categories, especially for adjacent or similarly shaped teeth. In this paper, we propose SPGTNet, a spatial prior-guided transformer method, designed to both the extracted tooth positional features from CNNs and the long-range contextual information from vision transformers, specifically for dental panoramic X-ray image segmentation. Initially, a center-based spatial prior perception module is employed to identify the centroid of each tooth, thereby enhancing the spatial prior information for the CNN sequence features. Subsequently, a bi-directional cross-attention module is designed to facilitate the interaction between the spatial prior information of the CNN sequence features and the long-range contextual features of the vision transformer sequence features. Finally, an instance identification head is employed to generate the tooth segmentation results. Extensive experiments on three public benchmark datasets demonstrate the effectiveness and superiority of our proposed method compared to other state-of-the-art approaches. The proposed method accurately identifies and analyzes tooth structures, thereby providing crucial information for dental diagnosis, treatment planning, and research.
Read full abstract