Improving CNN-based solutions for emotion recognition using evolutionary algorithms

Parsa Mohammadrezaei,Mohammad Aminan,Mohammad Soltanian,Keivan Borna

doi:10.1016/j.rinam.2023.100360

Parsa Mohammadrezaei, Mohammad Aminan + Show 2 more

Open Access

https://doi.org/10.1016/j.rinam.2023.100360

Copy DOI

Journal: Results in Applied Mathematics	Publication Date: Feb 17, 2023
Citations: 1	License type: cc-by-nc-nd

Affiliation: Kharazmi University

Abstract

AI-based approaches, especially deep learning have made remarkable achievements in Speech Emotion Recognition (SER). Needless to say, Convolutional Neural Networks (CNNs) have been the backbone of many of these solutions. Although the use of CNNs have resulted in high performing models, building them needs domain knowledge and direct human intervention. The same issue arises while improving a model. To solve this problem, we use techniques that were firstly introduced in Neural Architecture Search (NAS) and use a genetic process to search for models with improved accuracy. More specifically, we insert blocks with dynamic structures in between the layers of an already existing model and then use genetic operations (i.e. selection, mutation, and crossover) to find the best performing structures. To validate our method, we use this algorithm to improve architectures by searching on the Berlin Database of Emotional Speech (EMODB). The experimental results show at least 1.7% performance improvement in terms of Accuracy on EMODB test set.

Full Text