Abstract

BackgroundPeripheral artery disease (PAD) is underdiagnosed due to poor patient and clinician awareness. Despite this, no widely accepted PAD screening is recommended. ObjectivesThe authors used machine learning to develop an automated risk stratification tool for identifying patients with a high likelihood of PAD. MethodsUsing data from the electronic health record (EHR), ankle-brachial indices (ABIs) were extracted for 3,298 patients. In addition to ABI, we extracted 60 other patient characteristics and used a random forest model to rank the features by association with ABI. The model identified several features independently correlated with PAD. We then built a logistic regression model to predict PAD status on a validation set of patients (n = 1,089), an external cohort of patients (n = 2,922), and a national database (n = 2,488). The model was compared to an age-based and random forest model. ResultsThe model had an area under the curve (AUC) of 0.68 in the validation set. When evaluated on an external population using EHR data, it performed similarly with an AUC of 0.68. When evaluated on a national database, it had an AUC of 0.72. The model outperformed an age-based model (AUC: 0.62; P < 0.001). A random forest model with inclusion of all 60 features did not perform significantly better (AUC: 0.71; P = 0.31). ConclusionsStatistical techniques can be used to build models which identify individuals at high risk for PAD using information accessible from the EHR. Models such as this may allow large health care systems to efficiently identify patients that would benefit from aggressive preventive strategies or targeted-ABI screening.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call