Emotion detection is one of the greatest challenges of Natural Language Processing (NLP). Often referred to as emotion recognition, it is the process of identifying a person’s various feelings or emotions such as: happiness, sadness, or anger. Emotions are a strong feeling regarding a human's situation or relation with others. They are the mental states that affect human behavior and interactions. In this paper, we propose an approach for emotion detection in audio files, focusing on a natural Arabic audio dataset and applying several Machine Learning (ML) classifiers: Sequential Minimal Optimization (SMO), Random Forest (RF), K-Nearest Neighbours (KNN), and Simple Logistic (SL). The classification experiments were conducted using sixteen acoustic feature sets. Many acoustic features were explored including Mel Frequency Cepstral Coefficient (MFCC), Mel spectrogram, spectral contrast, Zero Crossing Rate (ZCR), and Intensity. The experimental results show that SMO and SL classifiers achieved the highest overall accuracy 83.82% when using combinations of all acoustic features (MFCC, Mel spectrogram, Spectral contrast, ZCR and intensity). Additionally, The RF and KNN classifiers yielded Competitive results, with accuracies of 81.71% and 77.34%, respectively. These results suggest that combining multiple acoustic features significantly enhances the performance of emotion detection models, especially for complex emotions in natural Arabic audio datasets
Read full abstract