Speech Emotion Recognition Using Convolution Neural Networks

Krishna Chauhan,Kamalesh Kumar Sharma,Tarun Varma

doi:10.1109/icais50930.2021.9395844

Abstract

This study aims to recognize the deep features in speech for the emotion recognition task, with less complex architecture and fewer learnable parameters. We have proposed a simple CNN (convolutional neural network) architecture, based on log-mel-spectrograms of segmented speech utterances. The proposed architecture is used to extract the emotion-related features for two principally used databases in speech emotion recognition applications, Interactive emotional dyadic motion capture (IEMOCAP) and the Berlin database of emotional speech (EmoDB) databases. Several extensive experiments on these datasets demonstrate the performance of the proposed model and the results are compared with the recent CNN architectures. For speaker-independent analysis, the proposed CNN network achieves classification accuracies of 59.33% and 65.47% on IEMOCAP and improvised IEMOCAP utterances respectively for four emotional classes, and 72.02% for the Berlin EmoDB databases for seven classes.

Full Text