POS0535 QUERY ALGORITHMS AND MACHINE LEARNING METHODS AS TOOLS TO IDENTIFY COMORBIDITIES IN LARGE-SCALE FREE-TEXT BASED FIELDS: A CASE-REPORT

Daphne C Rohrich ,A De Boer ,Tjardo Maarseveen ,Călin-Adrian Popa ,Alfons A Den Broeder ,Rachel Knevel

doi:10.1136/annrheumdis-2021-eular.2094

Abstract

Background: Inflammatory rheumatic conditions (IRC) are associated with comorbidity, two most important being cardiovascular diseases (CVD) and infections [1, 2]. A crucial initial step when proceeding with studying CVD and infections in these patients is identification of events. The large scale EHR datasets enable studies to assess low incident and clinically important events, but requires both accurate as well as efficient data extraction. Studying Electronic Health Records (EHR) using query-based algorithms (QBA) and machine learning algorithms (MLA) offers a valuable tool to screen large-scale collections for rare events, thus replacing resource intensive manual chart review. Objectives: To explore the (comparative) usefulness of QBA and MLA to identify CVD and infection events in EHR free-text data in patients with chronic IRC. Methods: To independently develop and validate the algorithms we used two EHR databases, i.e. a training set with psoriatic arthritis patients (N=977, dataset A, Golden Standard) and a validation set of rheumatoid arthritis patients (N=1098, dataset B). Using both QBA and MLA, we aimed to identify (yes/no and timing) CVD and infections. We assessed the performances of the algorithms by calculating the specificity, sensitivity, positive predictive value (PPV) and the negative predictive value (NPV), respectively. Results: In the final performance analysis on dataset B, both QBA and MLA showed a high performance in identifying CVD (sensitivity, specificity, PPV, NPV for QBA ((95% CI)= 0.69 (0.66-0.72), 0.99 (0.96-1.02), 0.84 (0.81-0.87), 0.98 (0.95-1.01)) and for MLA (sensitivity = 0.69 (0.66-0.72), 0.98 (0.95-1.01), 0.68 (0.65-0.71), 0.98 (0.95-1.01), respectively) Infections showed similar performance (QBA sensitivity, specificity, PPV, NPV is 0.64 (0.61-0.67), 0.96 (0.93-0.99), 0.66 (0.63-0.69), 0.96 (0.93-.0.99) and for MLA = 0.61 (0.58-0.64), 0.93 (0.90-0.96), 0.49 (0.46-0.52), 0.96 (0.93-0.99), respectively). For infections the specificity was slightly higher for QBA relative to MLA. Conclusion: We found a consistent high performance of both the QBA and MLA algorithms for the identification of CVD and infections in our free text EHR of patients with chronic IRC (Table 1). The performance of QBA highly depends on the domain knowledge of the builders, which might allow it to outperform a Gold Standard. MLA is efficient as it does not require any domain knowledge, but its performance is restricted by the quality of the Gold Standard.

Full Text