BackgroundPatients with rheumatoid arthritis (RA) have an increased risk of developing serious infections (SIs) vs. individuals without RA; efforts to predict SIs in this patient group are ongoing. We assessed the ability of different machine learning modeling approaches to predict SIs using baseline data from the tofacitinib RA clinical trials program.MethodsThis analysis included data from 19 clinical trials (phase 2, n = 10; phase 3, n = 6; phase 3b/4, n = 3). Patients with RA receiving tofacitinib 5 or 10 mg twice daily (BID) were included in the analysis; patients receiving tofacitinib 11 mg once daily were considered as tofacitinib 5 mg BID. All available patient-level baseline variables were extracted. Statistical and machine learning methods (logistic regression, support vector machines with linear kernel, random forest, extreme gradient boosting trees, and boosted trees) were implemented to assess the association of baseline variables with SI (logistic regression only), and to predict SI using selected baseline variables using 5-fold cross-validation. Missing values were handled individually per prediction model.ResultsA total of 8404 patients with RA treated with tofacitinib were eligible for inclusion (15,310 patient-years of total follow-up) of which 473 patients reported SIs. Amongst other baseline factors, age, previous infection, and corticosteroid use were significantly associated with SI. When applying prediction modeling for SI across data from all studies, the area under the receiver operating characteristic (AUROC) curve ranged from 0.656 to 0.739. AUROC values ranged from 0.599 to 0.730 in data from phase 3 and 3b/4 studies, and from 0.563 to 0.643 in data from ORAL Surveillance only.ConclusionsBaseline factors associated with SIs in the tofacitinib RA clinical trial program were similar to established SI risk factors associated with advanced treatments for RA. Furthermore, while model performance in predicting SI was similar to other published models, this did not meet the threshold for accurate prediction (AUROC > 0.85). Thus, predicting the occurrence of SIs at baseline remains challenging and may be complicated by the changing disease course of RA over time. Inclusion of other patient-associated and healthcare delivery-related factors and harmonization of the duration of studies included in the models may be required to improve prediction.Trial registrationClinicalTrials.gov: NCT00147498; NCT00413660; NCT00550446; NCT00603512; NCT00687193; NCT01164579; NCT00976599; NCT01059864; NCT01359150; NCT02147587; NCT00960440; NCT00847613; NCT00814307; NCT00856544; NCT00853385; NCT01039688; NCT02187055; NCT02831855; NCT02092467.