Abstract

The paper introduced the present status of speech emotion recognition. In order to improve the single-mode emotion recognition rate, the bimodal fusion method based on speech and facial expression was proposed. The emotional databases of Chinese speech and facial expressions were established with the noise stimulus and movies evoking subjects' emtion. On the foundation, we analyzed the acoustic features of Chinese speech signals under different emotional states, and obtained the general laws of prosodic feature parameters. We discussed the single-mode speech emotion recognitions based on the prosodic features and the geometric features of facial expression. Then, the bimodal emotion recognition was obtained by the use of Gaussian Mixture Model. The experimental results showed that, the bimodal emotion recognition rate combined with facial expression was about 6% higher than the single-model recognition rate merely using prosodic features. DOI: http://dx.doi.org/10.11591/telkomnika.v11i1.1873

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call