It is proposed to use Deep Convolution Neural Network (DCNN) which is a good classifier of natural images to learn speech spectrum images of sustained phonation to detect Parkinson’s Disease (PD) as an alternative to the existing feature-based machine learning method. It is shown that the proposed method yields very high accuracy without the need for separate feature computation stage. The speech spectrum representations proposed are Short Time Fourier Transform (STFT) spectrum of size Nx256 and Line Spectral Frequency (LSF) spectrum of size Nx16. LSF reflects the speech production mechanism and it is a novel idea to use LSF spectrum in DCNN to detect PD speech. The spectrum images look like random patterns and the performance is improved when using an additional deeper hidden layer of tampering pattern in the last stage of a fully connected layer. Using a standard PD-sustained phonation dataset the training accuracies achieved are 98.50% and 92.50% for STFT and LSF method, respectively. The validation accuracies achieved are 84.38% for STFT and 100% for LSF. The STFT method results in a sensitivity of 97.05%, a specificity of 88.63%, a precision of 86.84%, an F1-score of 91.66, a false positive rate (FPR) of 11.36%, and a false alarm rate of 12.82%. The LSF method results in a sensitivity 97.05%, a specificity of 95.45%, a precision of 94.28%, an F1-score of 95.65, an FPR of 4.50%, and a false alarm rate of 5.71%. The LSF based method performs better and the performance comparison with the state-of-the-art methods brings out the merits of the LSF spectrum image-based DCNN learning in PD detection using sustained phonation.
Read full abstract