The Sharp-van der Heijde score (SvH) is crucial for assessing joint damage in rheumatoid arthritis (RA) through radiographic images. However, manual scoring is time-consuming and subject to variability. This study proposes a multistage deep learning model to predict the Overall Sharp Score (OSS) from hand X-ray images. The framework involves four stages: image preprocessing, hand segmentation with UNet, joint identification via YOLOv7, and OSS prediction utilizing a custom Vision Transformer (ViT). Evaluation metrics included Intersection over Union (IoU), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Huber loss, and Intraclass Correlation Coefficient (ICC). The model was trained using stratified group 3-fold cross-validation on a dataset of 679 patients and tested externally on 291 subjects. The joint identification model achieved 99% accuracy. The ViT model achieved the best OSS prediction for patients with Sharp scores < 50. It achieved a Huber loss of 4.9, an RMSE of 9.73, and an MAE of 5.35, demonstrating a strong correlation with expert scores (ICC = 0.702, P < 0.001). This study is the first to apply a ViT for OSS prediction in RA. It presents an efficient and automated alternative for overall damage assessment. This approach may reduce reliance on manual scoring.
Read full abstract