Abstract

The objective of this investigation was to analyze de-identified electronic health record (EHR) data to predict a Parkinson’s Disease (PD) diagnosis. Patients ≥ 30 years of age with evidence of continuous activity from January 1, 2012 to December 31, 2013 were eligible for inclusion (n = 3,057,540). PD cases (n = 2,097) were identified by two diagnoses for PD (ICD-9: 332.0) in calendar year 2013 and controls (n=2,548,563) were without a diagnosis for PD. A “training” dataset (n = 1,912,996) was used for model development and a “test” dataset (n = 637,664) was reserved to confirm model performance. Sixty demographic, clinical diagnosis and healthcare resource utilization (HRU) variables derived from the calendar year 2012 were entered into logistic regression (LR), classification and regression tree (CART), and random forest (RF) models. The LR and CART models used the full dataset, however, downsampling was applied to the RF model to handle class imbalance. Importance of the variables was estimated and predictive accuracy was evaluated using area under the curve (AUC). The LR model (AUC=0.84) was the better fit when applied to training data compared to CART (AUC = 0.53) and RF (AUC= 0.72) models. Age, sex, diagnosis of postural instability, and diagnosis of sleep disorders were important variables in predicting a PD diagnosis. Furthermore, number of levodopa prescriptions written and visits to a general practitioner in the year prior to diagnosis were important HRU variables. LR model performance metrics were acceptable when applied to the test dataset (AUC=0.85, specificity=0.75, sensitivity=0.81). Data mining methods can be used to identify patients with Parkinson’s Disease using 60 variables in EHR data with acceptable AUC, sensitivity, and specificity. Sleep disorders may be more predictive of PD in the year prior to diagnosis than previous research suggests.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call