Speech emotion recognition (SER) has recently been receiving increased interest due to the rapid advancements in affective computing and human computer interaction. English, German, Mandarin and Indian are among the most commonly considered languages for SER along with other European and Asian languages. However, few researches have implemented Arabic SER systems due to the scarcity of available Arabic speech emotion databases. Although Egyptian Arabic is considered one of the most widely spoken and understood Arabic dialects in the Middle East, no Egyptian Arabic speech emotion database has yet been devised. In this work, a semi-natural Egyptian Arabic speech emotion (EYASE) database is introduced that has been created from an award winning Egyptian TV series. The EYASE database includes utterances from 3 male and 3 female professional actors considering four emotions: angry, happy, neutral and sad. Prosodic, spectral and wavelet features are computed from the EYASE database for emotion recognition. In addition to the classical pitch, intensity, formants and Mel-frequency cepstral coefficients (MFCC) widely implemented for SER, long-term average spectrum (LTAS) and wavelet parameters are also considered in this work. Speaker independent and speaker dependent experiments were performed for three different cases: (1) emotion vs. neutral classifications, (2) arousal and valence classifications and (3) multi-emotion classifications. Several analysis were made to explore different aspects related to Arabic SER including the effect of gender and culture on SER. Furthermore, feature ranking was performed to evaluate the relevance of the LTAS and wavelet features for SER, in comparison to the more widely used prosodic and spectral features. Moreover, anger detection performance is compared for different combinations of the implemented prosodic, spectral and wavelet features. Feature ranking and anger detection performance analysis showed that both LTAS and wavelet features were relevant for Arabic SER and that they significantly improved emotion recognition rates.
Read full abstract