Emotion recognition method based on normalization of prosodic features

Motoyuki Suzuki,Shohei Nakagawa,Kenji Kita

doi:10.1109/apsipa.2013.6694147

Abstract

Emotion recognition from speech signals is one of the most important technologies for natural conversation between humans and robots. Most emotion recognizers extract prosodic features from an input speech in order to use emotion recognition. However, prosodic features changes drastically depending on the uttered text. In order to solve this problem, we have proposed the normalization method of prosodic features by using the synthesized speech, which has the same word sequence but uttered with a “neutral” emotion. In this method, all prosodic features (pitch, power, etc.) are normalized. However, nobody knows which kind of prosodic features should be normalized. In this paper, all combinations of with/without normalization were examined, and the most appropriate normalization method was found. When both “RMS Energy” (root mean square frame energy) and “VoiceProb” (power of harmonics divided by the total power) were normalized, emotion recognition accuracy became 5.98% higher than the recognition accuracy without normalization.

Full Text