Applications of Machine Learning to Diagnosis of Parkinson's Disease.

Hong Lai,Yang Song,Xu-Ying Li,Xianlin Wang,Chaodong Wang,Xian Li,Zhanjun Wang,Fanxi Xu,Junge Zhu

doi:10.3390/brainsci13111546

Abstract

Accurate diagnosis of Parkinson's disease (PD) is challenging due to its diverse manifestations. Machine learning (ML) algorithms can improve diagnostic precision, but their generalizability across medical centers in China is underexplored. To assess the accuracy of an ML algorithm for PD diagnosis, trained and tested on data from different medical centers in China. A total of 1656 participants were included, with 1028 from Beijing (training set) and 628 from Fuzhou (external validation set). Models were trained using the least absolute shrinkage and selection operator-logistic regression (LASSO-LR), decision tree (DT), random forest (RF), eXtreme gradient boosting (XGboost), support vector machine (SVM), and k-nearest neighbor (KNN) techniques. Hyperparameters were optimized using five-fold cross-validation and grid search techniques. Model performance was evaluated using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, accuracy, sensitivity (recall), specificity, precision, and F1 score. Variable importance was assessed for all models. SVM demonstrated the best differentiation between healthy controls (HCs) and PD patients (AUC: 0.928, 95% CI: 0.908-0.947; accuracy: 0.844, 95% CI: 0.814-0.871; sensitivity: 0.826, 95% CI: 0.786-0.866; specificity: 0.861, 95% CI: 0.820-0.898; precision: 0.849, 95% CI: 0.807-0.891; F1 score: 0.837, 95% CI: 0.803-0.868) in the validation set. Constipation, olfactory decline, and daytime somnolence significantly influenced predictability. We identified multiple pivotal variables and SVM as a precise and clinician-friendly ML algorithm for prediction of PD in Chinese patients.

Full Text