Abstract
Recent research has made significant progress with definitively identifying individuals with Parkinson's disease (PD) using speech analysis techniques. However, these studies have often treated the early and advanced stages of PD as equivalent, overlooking the distinct speech impairments and symptoms that can vary significantly across the various stages. This research aims to enhance diagnostic accuracy by utilizing advanced optimization strategies to combine speech recognition results (character error rates) with the acoustic features of vowels for more rigorous diagnostic precision. The dysphonia features of three sustained Korean vowels /아/ (a), /이/ (i), and /우/ (u) were examined for their diversity and strong correlations. Four recognized machine-learning classifiers: Random Forest, Support Vector Machine, k-Nearest Neighbors, and Multi-Layer Perceptron, were employed for consistent and reliable analysis. By fine-tuning the Whisper model specifically for PD speech recognition and optimizing it for each severity level of PD, we significantly improved the discernibility between PD severity levels. This enhancement, when combined with vowel data, allowed for a more precise classification, achieving an improved detection accuracy of 5.87% for a 3-level severity classification over the PD "ON"-state dataset, and an improved detection accuracy of 7.8% for a 3-level severity classification over the PD "OFF"-state dataset. This comprehensive approach not only evaluates the effectiveness of different feature extraction methods but also minimizes the variance across final classification models, thus detecting varying severity levels of PD more effectively.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have