Abstract

Speech signal processing is an active area of research, the most dominant source of exchanging information among human beings, and the best way for human–computer interaction (HCI). Human behavior assessments and emotion recognition from a speech signal, such as speech emotion recognition (SER) is an emerging HCI area of exploration with various real time claims. The performance of an efficient SER system depends on feature learning, which include salient and discriminative information such as high-level deep features. In this paper, we proposed a two-stream deep convolutional neural network with an iterative neighborhood component analysis (INCA) to learn mutually spatial-spectral features and select the most discriminative optimal features for the final prediction. Our model is composed of two channels, and each channel is associated with the convolutional neural network structure to extract cues from the oral signals. The first channel extracts feature from the spectral domain, and the second channel extracts features from the spatial domain, which are then fused and fed to the INCA to remove the severance and select the optimal features for the final model training. The joint refine features are passed from the fully connected network with a softmax classifier to yield the predictions of the different emotions. We trained our proposed system using three benchmarks, which included the EMO-DB, SAVEE, and RAVDESS emotional speech corpora, and we tested the prediction performance to secure 95%, 82%, and 85% recognition rates. The performance of the system shows the effectiveness and significance of the proposed system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.