Abstract

Nowadays, analyzing and interpreting emotions through human speech communication have drawn a great attention in the field of human-computer interaction. Therefore, many speech recognition systems have been suggested to recognize the emotional states of the speaker utilizing the speech recordings of their spoken utterances. Feature extraction is an important step in building an emotion recognition system in which it is used to extract emotional features from speech data. However, not all extracted features are relevant to classify the emotion states of the speaker. The existence of irrelevant and redundant features generates unmeaningful patterns that lead to inaccurate and undesirable emotion classification performance. Therefore, this study proposes an intelligent feature selection method based on a novel bio-inspired optimization algorithm that mimics the hunting mechanism of wolves in the nature, called Grey Wolf Optimizer (GWO) and K-nearest neighbor (KNN) classifier, to find the most relevant subset of features to enhance the classification performance of an emotion recognition systems. The proposed method is called GWO-KNN. Emotion classification is performed on three distinct databases including Arabic Emirati-accented speech database, Ryerson Audio-Visual Database of Emotional Speech and Song dataset (RAVDESS), and Surrey Audio-Visual Expressed Emotion dataset (SAVEE). A combined or single feature extraction method is applied to extract the features from each dataset. The proposed method provides better classification performance for speech emotion recognition system compared to classical methods such as bat algorithm (BAT), cuckoo search (CS), White Shark Optimizer (WSH), and arithmetic optimization algorithm (AOA). Our proposed method also surpasses several state-of-the-art recent approaches that use the same datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call