Abstract

In the field of speech emotion recognition(SER), great progress has been made in feature extraction and classifier construction. However, in the natural environment, speech emotion recognition is vulnerable to environmental noise. Moreover, cross-corpus SER is a thorny problem because the gender differences and language differences between corpus. Therefore, these problems need to be solved. Considering the noise problem, wavelet threshold denoising was proposed. After that, this paper used multi-task learning method to solve the problem of languages and gender differences. Emotion recognition is primary task, while gender recognition and language recognition are auxiliary tasks. According to the multi-task learning model, the shared LSTM is designed to extract the shared features. Three private LSTMs are used to extract the features of emotion recognition ,language recognition and gender recognition respectively. Orthogonal constraint training was used to make the private space contain only private features. Two experiments are designed in this paper. The first experiment is the comparison between different wavelet bases and different numbers of decomposition layers to select suitable wavelet base and the number of decomposition layer. The second experiment is the comparison between cross-corpus SER in other literature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call