Abstract

Speech emotions (SEs) are an essential component of human interactions and an efficient way of persuading human behavior. The recognition of emotions from the speech is an emergent but challenging area of digital signal processing (DSP). Healthcare professionals are always looking for the best ways to understand patient voices for better diagnosis and treatment. Speech emotion recognition (SER) from the human voice, particularly in a person with neurological disorders like Parkinson's disease (PD), can expedite the diagnostic process. Patients with PD are primarily passed through diagnosis via expensive tests and continuous monitoring that is time-consuming and costly. This research aims to develop a system that can accurately identify common SEs which are important for PD patients, such as anger, happiness, normal, and sadness. We proposed a novel lightweight deep model to predict common SEs. The adaptive wavelet thresholding method is employed for pre-processing the audio data. Furthermore, we generated spectrograms from the speech data instead of directly processing voice data to extract more discriminative features. The proposed method is trained on generated spectrograms of the IEMOCAP dataset. The suggested deep learning method contains convolution layers for learning discriminative features from spectrograms. The performance of the proposed framework is evaluated on standard performance metrics, which show promising real-time results for PD patients.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call