Abstract

The correlation between speech and emotions provides the possibility for the realization of psychological forewarning system. Because of the unexpected noise when transferring speech into semantic information, directly obtaining emotion from voice information may perform more efficient and natural in speech emotion recognition. The key of improving the effect of speech emotion recognition is to introduce more reliable features and realized better accuracy. Based on this, 384 dimension hand-crafted features extracted from speech acting as the basic features of speech. At the same time, the multi-dimension learning features obtained from log- mel spectrogram through the improved convolutional neural network combined with long short term memory recurrent neural network will act as the highly features. By applying this method into speech emotion recognition, merged features have been proved to be effective. Compared with the results of only relying on features learning from the log-mel spectrogram, the merged features showed better performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.