Abstract
Background: This study evaluated the performance of machine learning models trained on two different datasets of knee X-ray images annotated with Kellgren–Lawrence grades. Methods: Learning curves indicated that one model experienced poor training, characterized by underfitting, while the other model demonstrated effective training with proper convergence. The poorly trained model appeared to perform adequately on its internal test set but failed to generalize to an external dataset, yielding suboptimal results. Results: In contrast, the well-trained model not only performed well on its internal validation but also showed adequate performance when tested on the external dataset. Conclusions: These findings highlight the importance of examining learning curves to assess model training quality and the critical necessity of external testing to evaluate generalizability. Most existing studies lack external validation, raising concerns about the reliability of their reported performance. This study emphasizes that without external testing, models may not perform as expected in real-world clinical settings, potentially impacting clinical decision-making for surgical interventions. The results advocate for the inclusion of external validation in model evaluation and the assessment of model convergence using learning curves to ensure the development of robust and generalizable tools for knee osteoarthritis severity assessment and other applications.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have