The paper aims to recognize emotions from speech using CNNs, leveraging well-known databases such as EmoDB and RAVDESS. It proposes a CNN-based architecture for emotion recognition. The proposed model is trained and evaluated in both speaker-dependent and speaker-independent settings. Speaker-dependent means the model is trained and tested on the same speaker's data, while speaker-independent means testing is done on speakers not seen during training. Techniques such as data augmentation and advanced pre-processing are included to enhance model performance. The results indicate that the proposed CNN architecture achieves comparable performance with state-of-the-art methods. It surpasses traditional systems in terms of accuracy or other relevant metrics. The paper contributes by proposing a CNN-based approach for speech emotion recognition using MLP classifier, which is evaluated on established databases and shows promising results compared to existing methods. It also explores various aspects of dataset properties, speech signal analysis, and classifier methods relevant to emotion recognition tasksdocuments in various Scopus indexed journals was examined bibliometrically, for the years 2014-2024.
Read full abstract