Recent advances in data analysis and wearable sensors for human movement monitoring can promote objective and accurate clinical evaluation of neurological symptoms and improve outcome measures in clinical trials [1–3]. The aim of this study was to combine modern technique of data analysis and wearable sensors to determine which supervised machine learning (ML) algorithm can most accurately classify people with Parkinson’s disease (pwPD) from speed-matched healthy subjects (HS) based on a selected minimum set of IMU-derived gait features. Twenty-two gait features were extrapolated from the trunk acceleration patterns of 81 pwPD and 80 HS, including spatiotemporal, pelvic kinematics, and acceleration-derived gait stability indexes. After a three-level feature selection procedure, seven gait features were considered for implementing five ML algorithms: support vector machine (SVM), artificial neural network, decision trees (DT), random forest (RF), and K-nearest neighbors. Accuracy, precision, recall, F1 score, AUC and generalization error were calculated. SVM outperformed the other ML algorithms in terms of classification metrics (test accuracy = 0.86; F1 score = 0.85; AUC = 0.85) and generalizability (generalization error = 2.95%) in classifying the gait impairment of pwPD compared with speed-matched healthy subjects, using a selected dataset of gait features based on lower trunk acceleration data. Although significantly lower than SVM, tree-based algorithms revealed good classification performances with low generalization errors (RF: test accuracy= 0.86; F1 score = 0.85; AUC = 0.85), and lower computational demand than SVM. ANN was similar to DT in terms of classification metrics but showed significantly higher generalization error (7.26%) than tree-based algorithms and SVM and higher computational demand than the other ML algorithms. Even though KNN showed the fastest time performance, its classification metrics were the lowest. We proposed a feature selection procedure based on the combination of filter, wrapper, embedded, and domain-specific methods that was effective in lowering the risk of overrepresenting multicollinear gait features in the model, resulting in a lower risk of overfitting in the test performances by increasing the explainability of the results at the same time. Because of their accurate results, their simplicity of understanding, and explanability, DT and RF algorithms could represent useful tools for the comprehension of gait disorders by making clinicians participate in the decision process. This is the first time that the accuracy and generalizability of the most performed ML algorithms in classifying pwPD gait abnormalities based on gait data from a single lumbar-mounted IMU have been compared. The findings of this study could be used to incorporate machine learning algorithms into software that processes gait data from lumbar-mounted IMUs. Future research could focus on finding the best tree- based model for classification and prediction problems in gait analysis.
Read full abstract