Abstract
Objectives1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs.MethodsThis study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge.ResultsPrimary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods.ConclusionData-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs.
Highlights
Rheumatoid arthritis (RA) is the most common chronic inflammatory arthritis worldwide
Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were linked in the secondary care rheumatology clinical system
The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis
Summary
Rheumatoid arthritis (RA) is the most common chronic inflammatory arthritis worldwide. One can examine information on first symptoms, tests/results, referrals, diagnosis, and prescriptions for all patients with a condition in the region or country, providing an unparalleled opportunity to study disease in a real world setting This offers the chance for real time surveillance of disease history, co-morbidities and long term treatment effects in patients with chronic diseases, such as RA. Secondary care electronic health records contain more robust RA-related diagnostic data than primary care records, but these records are sparse, cover far smaller patient numbers, often contain only severe active disease and are not available This leads to only identifying patients with RA in that specific rheumatology secondary care setting, thereby introducing bias and limiting generalizability. The methodology of manual selection of relevant codes based on expert knowledge is very subjective and depends on the health care system of the area and what clinicians (often secondary care physicians) think should be found in the primary care record
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.