BiLSTM with Data Augmentation using Interpolation Methods to Improve Early Detection of Parkinson Disease

Robertas Damaševičius,Olusola Abayomi-Alli,Rytis Maskeliūnas,Adebayo Abayomi-Alli

doi:10.15439/2020f188

Abstract

The lack of dopamine in the human brain is the cause of Parkinson disease (PD) which is a degenerative disorder common globally to older citizens. However, late detection of this disease before the first clinical diagnosis has led to increased mortality rate. Research effort towards the early detection of PD has encountered challenges such as: small dataset size, class imbalance, overfitting, high false detection rate, model complexity, etc. This paper aims to improve early detection of PD using machine learning through data augmentation for very small datasets. We propose using Spline interpolation and Piecewise Cubic Hermite Interpolating Polynomial (Pchip) interpolation methods to generate synthetic data instances. We further investigate on reducing dimensionality of features for effective and real-time classification while considering computational complexity of implementation on real-life mobile phones. For classification we use Bidirectional LSTM (BiLSTM) deep learning network and compare the results with traditional machine learning algorithms like Support Vector Machine (SVM), Decision Tree, Logistic regression, KNN and Ensemble bagged tree. For experimental validation we use the Oxford Parkinson disease dataset with 195 data samples, which we have augmented with 571 synthetic data samples. The results for BiLSTM shows that even with a holdout of 90%, the model was still able to effectively recognize PD with an average accuracy for ten rounds experiment using 22 features as 82.86%, 97.1%, and 96.37% for original, augmented (Spline) and augmented (Pchip) datasets, respectively. Our results show that proposed data augmentation schemes have significantly (p < 0.001) improved the accuracy of PD recognition on a small dataset using both classical machine learning models and BiLSTM

Highlights

PARKINSON Disease (PD) is a degenerative disorder of the central nervous system with major damage affecting the motor system in the brain cells [1]
Several databases have been created for easing research output in the detection of neurodegenerative disorder and these databases presented in existing literature for detection of Parkinson disease (PD) include dataset for detecting speech impairment [1], drawing movement [8], Volatile Organic Compounds (VOCs) in blood [9], cognitive impairment [10], electroencephalohraphy (EEG) and electromyography (EMG) bio-signals [11], images such as magnetic resonance imaging (MRI), functional MRI, positron emission tomography (PET) [12], etc
We evaluated the performance of different supervised machine learning (ML) algorithms such as Decision Tree, Linear Discriminant, logistic regression, Support Vector Machine (SVM), KNN, and other ensemble algorithms to identify the best classifier

Summary

Introduction

PARKINSON Disease (PD) is a degenerative disorder of the central nervous system with major damage affecting the motor system in the brain cells [1] This disease is among the most common and fastest growing neurodegenerative disorders affecting close to 7 to 10 million people globally [2,3]. It is majorly caused by the lack of dopamine (neurotransmitter) in the human brain [4] and its effect can be categorized into motor and non-motor symptoms such as voice/speech impairment, dementia, depression, slow thinking, rigidity, tremor, bradykinesia, and other cognitive disabilities [4,5]. Further research endeavors in early diagnosis of PD before it progresses any further making any medical assistance and treatment ineffective are very important [14]

Objectives

Methods

Results

Conclusion