BackgroundAccurately classifying primary bone tumors is crucial for guiding therapeutic decisions. The National Comprehensive Cancer Network guidelines recommend multimodal images to provide different perspectives for the comprehensive evaluation of primary bone tumors. However, in clinical practice, most patients’ medical multimodal images are often incomplete. This study aimed to build a deep learning model using patients’ incomplete multimodal images from X-ray, CT, and MRI alongside clinical characteristics to classify primary bone tumors as benign, intermediate, or malignant.MethodsIn this retrospective study, a total of 1305 patients with histopathologically confirmed primary bone tumors (internal dataset, n = 1043; external dataset, n = 262) were included from two centers between January 2010 and December 2022. We proposed a Primary Bone Tumor Classification Transformer Network (PBTC-TransNet) fusion model to classify primary bone tumors. Areas under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity were calculated to evaluate the model’s classification performance.ResultsThe PBTC-TransNet fusion model achieved satisfactory micro-average AUCs of 0.847 (95% CI: 0.832, 0.862) and 0.782 (95% CI: 0.749, 0.817) on the internal and external test sets. For the classification of benign, intermediate, and malignant primary bone tumors, the model respectively achieved AUCs of 0.827/0.727, 0.740/0.662, and 0.815/0.745 on the internal/external test sets. Furthermore, across all patient subgroups stratified by the distribution of imaging modalities, the PBTC-TransNet fusion model gained micro-average AUCs ranging from 0.700 to 0.909 and 0.640 to 0.847 on the internal and external test sets, respectively. The model showed the highest micro-average AUC of 0.909, accuracy of 84.3%, micro-average sensitivity of 84.3%, and micro-average specificity of 92.1% in those with only X-rays on the internal test set. On the external test set, the PBTC-TransNet fusion model gained the highest micro-average AUC of 0.847 for patients with X-ray + CT.ConclusionsWe successfully developed and externally validated the transformer-based PBTC-Transnet fusion model for the effective classification of primary bone tumors. This model, rooted in incomplete multimodal images and clinical characteristics, effectively mirrors real-life clinical scenarios, thus enhancing its strong clinical practicability.