Exploiting the potentialities of features for speech emotion recognition

Dongdong Li,Yijun Zhou,Zhe Wang,Daqi Gao

doi:10.1016/j.ins.2020.09.047

Abstract

In recent years, studies on speech signals have increasingly paid attention to emotional information. The most challenging aspect in speech emotion recognition (SER) is choosing the optimal speech feature representation. According to the statistical analysis, the roles of each speech feature differ under different emotions, indicating that different features have different abilities in distinguishing emotions. This study proposes an emotional-category based feature weighting (ECFW) method, which aims at finding the prominence of each feature under different emotions and applying this prominence as priori knowledge. Furthermore, previous studies have paid little attention to matching the relationship between speech features and models. This study argues that different combinations of models and features result in large differences in the performance of SER, which are evaluated by several experiments. Features must be modeled with appropriate approaches to extract the most valuable information for emotional representation. Then, the best combinations of features and models are selected to test our method. The method is applied on three commonly used speech emotion databases, IEMOCAP, MASC, and EMO-DB. The results show that ECFW significantly improves the performance of SER tasks.

Full Text