Abstract Background/Aims On average, there is a delay of 6.7 years between symptom onset and diagnosis of axial spondyloarthritis (axSpA). Since traditional approaches to improving early axSpA identification have had limited success, predictive automated analyses using patient records may help alleviate the burden on healthcare providers. We report results from a machine learning (ML) algorithm developed with UK electronic health records (EHRs) Clinical Practice Research Datalink (CPRD) data to estimate the probability or likelihood of a patient being diagnosed with axSpA based on prior clinical indicators and patient history. Methods Primary care UK EHR data - CPRD GOLD was used to identify patients with axSpA and healthy controls (HC). Patients aged ≥18 years with first diagnosis date of axSpA within the identification period (01-Jan-2005 to 31-Dec-2018) and fulfilling CPRD research acceptability criteria were included. Data pertaining to clinical presentation, consultation, referral, test, and therapy history were extracted for individual patients prior to diagnosis of axSpA. A total of 5,090 patients with axSpA satisfied the acceptability criteria. HC were randomly sampled to create a subset of one unique HC matched to each patient with axSpA, resulting in 5,089 HC. ML usable features derived from the total population (patients with axSpA and HC) numbered 820. After using a further exclusion criterion for the patients with axSpA and HC who had ≥1 of 820 usable features, the final dataset included 7,813 patients (3,902 with axSpA and 3,911 HC). This combined dataset was randomly split (67:33) into a train (n = 5237) and a test (n = 2576) dataset. A random forest (RF) model was trained on the train dataset. Cross-validation was performed for hyper-parameter tuning of the RF classifier. Once the model was trained, accuracy, precision, and F-1 scores were obtained with the test dataset. Results The RF-based algorithm resulted in a high level of accuracy (88.12%), with precision of 0.95 for patients with axSpA and 0.83 for HC. The RF algorithm identified 89 best clinical predictors (out of 820 used as inputs) that differentiated between patient and HC such as: total number of tests, total number of referrals, first age of consultation, first symptom age, and number of low back pain symptoms. The model sensitivity was 0.75 and positive predictive value was 80.88%. The model specificity was 0.96 and negative predictive value was 82.56%. Conclusion The ML algorithm demonstrated a high level of accuracy and precision in the identification of possible cases of axSpA, which may be useful in reducing the delay in diagnosis. Previous studies have successfully demonstrated automated cohort identification of axSpA in large datasets, with only a few using ML-based approaches for diagnosis from patient medical history. While our model supports previous work in axSpA, it needs further validation in routine clinical practice (exploration ongoing). Disclosure R. Sengupta: Honoraria; AbbVie, Biogen, Celgene, Lilly, MSD, Novartis, Roche, UCB. Grants/research support; AbbVie, Celgene, Novartis, UCB. S. Narasimham: Shareholder/stock ownership; Novartis. Other; Employee of Novartis. B.S. Mato: Shareholder/stock ownership; Novartis. Other; Employee of Novartis. M. Meglic: Other; Employee of Novartis. C. Perella: Other; Employee of Novartis. P. Pamies: Other; Employee of Novartis. P. Emery: Consultancies; AbbVie, Astra-Zeneca, BMS, Boehringer Ingelheim, Celltrion, Gilead, Janssen, MSD, Lilly, Novartis, Pfizer, Roche, Samsung, UCB. Grants/research support; AbbVie, BMS, Lilly, Novartis, Pfizer, Roche, Samsung.
Read full abstract