Background/Objectives: Developing high-performance artificial intelligence (AI) models for rare diseases is challenging owing to limited data availability. This study aimed to evaluate whether a novel three-class annotation method for preparing training data could enhance AI model performance in detecting osteosarcoma on plain radiographs compared to conventional single-class annotation. Methods: We developed two annotation methods for the same dataset of 468 osteosarcoma X-rays and 378 normal radiographs: a conventional single-class annotation (1C model) and a novel three-class annotation method (3C model) that separately labeled intramedullary, cortical, and extramedullary tumor components. Both models used identical U-Net-based architectures, differing only in their annotation approaches. Performance was evaluated using an independent validation dataset. Results: Although both models achieved high diagnostic accuracy (AUC: 0.99 vs. 0.98), the 3C model demonstrated superior operational characteristics. At a standardized cutoff value of 0.2, the 3C model maintained balanced performance (sensitivity: 93.28%, specificity: 92.21%), whereas the 1C model showed compromised specificity (83.58%) despite high sensitivity (98.88%). Notably, at the 25th percentile threshold, both models showed identical false-negative rates despite significantly different cutoff values (3C: 0.661 vs. 1C: 0.985), indicating the ability of the 3C model to maintain diagnostic accuracy at substantially lower thresholds. Conclusions: This study demonstrated that anatomically informed three-class annotation can enhance AI model performance for rare disease detection without requiring additional training data. The improved stability at lower thresholds suggests that thoughtful annotation strategies can optimize the AI model training, particularly in contexts where training data are limited.
Read full abstract