Abstract

The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizing historical repeat measures of multiple risk factors recorded in EHRs to systematically identify patients at high risk of future disease. The approach is principally based on a 2-stage dynamic landmark model. The first stage estimates current risk factor values from all available historical repeat risk factor measurements via landmark-age–specific multivariate linear mixed-effects models with correlated random intercepts, which account for sporadically recorded repeat measures, unobserved data, and measurement errors. The second stage predicts future disease risk from a sex-stratified Cox proportional hazards model, with estimated current risk factor values from the first stage. We exemplify these methods by developing and validating a dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol in 41,373 persons from 10 primary-care practices in England and Wales contributing to The Health Improvement Network (1997–2016). Using cross-validation, the model was well-calibrated (Brier score = 0.041, 95% confidence interval: 0.039, 0.042) and had good discrimination (C-index = 0.768, 95% confidence interval: 0.759, 0.777).

Highlights

  • Risk prediction models are typically developed using traditional prospective study designs, which define a baseline origin at which risk factors were observed and from which to predict future disease risk

  • electronic health records (EHRs) are dynamic in nature—for example, in primary-care records, an individual’s follow-up begins at registration with a general practice, risk factors are measured sporadically during general practice visits, and followup continues until the person transfers out or dies

  • While multiple methods exist for developing risk prediction models using EHRs, a previous systematic review found that only 8% of studies modeled repeated longitudinal measures, only 54% accounted for missing data, only 16% appropriately accounted for censoring and loss to follow-up, and none assessed informative observations [6]

Read more

Summary

Introduction

Risk prediction models are typically developed using traditional prospective study designs, which define a baseline origin at which risk factors were observed and from which to predict future disease risk. While multiple methods exist for developing risk prediction models using EHRs, a previous systematic review found that only 8% of studies modeled repeated longitudinal measures, only 54% accounted for missing data, only 16% appropriately accounted for censoring and loss to follow-up, and none assessed informative observations (where the clinic visit itself provides meaningful information) [6]. We propose an extension to this, whereby we replace the last observed values with error-free risk factor values estimated from a multivariate linear mixed-effects model using all available repeated measures of past risk factor values for each landmark age [8]. We explore how landmarking can be combined with multivariate mixed-effects linear regression models to leverage the advantages of each method in order to generate dynamic risk prediction models suitable for use in EHRs. We illustrate our approach through the estimation of 10-year CVD risk using EHRs from 10 general practices in England and Wales

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.