Abstract

Objective To evaluate the predictive performance of logistic and linear regression versus machine learning (ML) algorithms to identify patients with rheumatoid arthritis (RA) treated with target immunomodulators (TIMs) using only pharmacy administrative claims. Methods Adults aged 18–64 years with ≥1 TIM claim in the IBM MarketScan commercial database were included in this retrospective analysis. The predictive ability of logistic regression to identify RA patients was compared with supervised ML classification algorithms including random forest (RF), decision trees, linear support vector machines (SVMs), neural networks, naïve Bayes classifier, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbors (k-NN). Model performance was evaluated using F1 score, accuracy, precision, sensitivity, area under the receiver operating characteristic curve (AUROC), and Matthews correlation coefficient (MCC). Analyses were conducted in all-patient and etanercept-only samples. Results In the all-patients sample, ML approaches did not outperform logistic regression. RF showed small improvements versus logistic regression that were not considered remarkable, respectively: F1 score (84.55% vs 83.96%), accuracy (84.05% vs 83.79%), sensitivity (84.53% vs 82.20%), AUROC (84.04% vs 83.85%), and MCC (68.07% vs 67.66%). Findings were similar in the etanercept samples. Conclusion Logistic regression and ML approaches successfully identified patients with RA in a large pharmacy administrative claims database. The ML algorithms were no better than logistic regression at prediction. RF, SVMs, LDA, and ridge classifier showed comparable performance, while neural networks, decision trees, naïve Bayes classifier, and QDA underperformed compared with logistic regression in identifying patients with RA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call