Abstract

Emotional expression is a fundamental aspect of human communication and understanding emotions from speech has a significant impact on a number of software, namely human-computer interface, virtual assistants, and emotion-driven analytics. A comprehensive study on utilizing Convolutional Neural Networks (CNNs) for emotional recognition of speech (SER). is presented in this thesis. Emotions from speech signals are recognized through the development of an efficient and accurate model. A large dataset of emotional speech samples, covering various emotions like Angry, disgusted, afraid, joyful, indifferent, depressed, surprised, is preprocessed and analyzed.
 A novel CNN architecture, optimized for SER, is proposed, incorporating multiple a maximum pooling sequential normalization, and various layers of complexity. By extracting relevant features from speech representations, distinct emotional patterns are discerned effectively. Numerous experiments were carried out to validate the CNN model's effectiveness. The dataset underwent division into distinct training, validation, and testing phases. The outcomes exhibit encouraging accuracy and resilience in discerning emotions across various categories. By scrutinizing the model’s confusion matrix, a comprehensive classification report was generated. This report assessed precision, recall, and F1-score for individual emotional classes. The model's strengths were accentuated, while possible avenues for enhancement were pinpointed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call