A comparison of prediction approaches for identifying prodromal Parkinson disease

Mark N. Warden,Alejandra Camacho-Soto,Susan Searles Nielsen,Roman Garnett,Brad A. Racette,Thippa Reddy Gadekallu

doi:10.1371/journal.pone.0256592

Mark N. Warden, Alejandra Camacho-Soto + Show 4 more

Open Access

https://doi.org/10.1371/journal.pone.0256592

Copy DOI

Abstract

Identifying people with Parkinson disease during the prodromal period, including via algorithms in administrative claims data, is an important research and clinical priority. We sought to improve upon an existing penalized logistic regression model, based on diagnosis and procedure codes, by adding prescription medication data or using machine learning. Using Medicare Part D beneficiaries age 66–90 from a population-based case-control study of incident Parkinson disease, we fit a penalized logistic regression both with and without Part D data. We also built a predictive algorithm using a random forest classifier for comparison. In a combined approach, we introduced the probability of Parkinson disease from the random forest, as a predictor in the penalized regression model. We calculated the receiver operator characteristic area under the curve (AUC) for each model. All models performed well, with AUCs ranging from 0.824 (simplest model) to 0.835 (combined approach). We conclude that medication data and random forests improve Parkinson disease prediction, but are not essential.

Highlights

Parkinson disease (PD) is a progressive, neurodegenerative disorder that is diagnosed when patients experience motor symptoms such as resting tremor, bradykinesia, rigidity, and postural instability
We identified diagnosis/procedure codes and active ingredients associated with PD using multivariable logistic regression
Complementary study [14] validated the previous PD predictive model [13], providing evidence that the model is effective and a possible strategy to identify those in the prodromal stage of PD

Summary

Introduction

Parkinson disease (PD) is a progressive, neurodegenerative disorder that is diagnosed when patients experience motor symptoms such as resting tremor, bradykinesia, rigidity, and postural instability. Army.mil/default); SSN: National Institute of Environmental Health Sciences K01ES028295 (https://www.niehs.nih.gov/).The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript Towards these ends, researchers have begun to move beyond traditional predictive modeling approaches by applying machine learning methods to a wide variety of data. At the optimal cut point, sensitivity was 73.5% and specificity was 83.2% While this least absolute shrinkage and selection operator (LASSO) penalized regression model performed well, the addition of Medicare Part D prescription medication data or the use of other analytic methods, such as machine learning methods, may have the potential to improve model performance. We were able to demonstrate modest improvements in model performance

Study participants

Results

Discussion