Abstract

Speech emotion recognition (SER) has gained much attention in recent years. SER system may be efficient depending on how much useful information contained in the extracted emotional features. Many research works have achieved state-of-the-art results using Convolutional Neural Network with different extracted speech features. These kinds of models can’t collect relative emotional salient features from speech signal. In this paper, we present a novel complementary feature extraction method to extract salient emotional features. We compute Melspectrogram and Mel Frequency Cepstral Coefficients (MFCC) to capture time-frequency domain information, aimed at converting raw speech into emotional informative features from speech signals. Moreover, we adopt complementary property strategy to extract features and construct 1D CNN model which selects emotional features effectively and evaluate the model’s performance on IEMOCAP, RAVDESS and Emo-DB speech corpus. Our method achieves better performance than baselines and competitive results using complementary features as input.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.