Abstract

—While computer vision aims to confound humans through the analysis of digital images, humans rely on their sensory perception to decipher emotions. Unlike the challenge of comprehending virtual images, assessing emotions in speech involves evaluating various aspects such as tone, volume, speed, and more. These novel techniques allow for the modulation of emotional "anxiety" in speech. In this model, our objective is to develop a convolutional neural network (CNN) version capable of segmenting input videos into seven distinct symbols representing emotions: anger, hatred, criticism, happiness, sadness, surprise, and neutrality. To achieve this, we leverage a CNN to extract and process semantic information from facial expressions, enabling us to discern these emotions accurately. Additionally, we implement data augmentation strategies to combat issues of overfitting and underfitting. The results from this version of the model indicate improved performance when handling larger images, achieving an accuracy rate exceeding 90%. This marks the creation of a CNN-based system for recognizing emotions conveyed through speech patterns. We have carefully adjusted several parameters to enhance the model's accuracy while also investigating the key factors influencing its performance. The model concludes by delving into a comprehensive discussion on the precision and robustness of different CNN designs, highlighting areas for potential improvement, and assessing the overall efficiency of these enhancements. Keywords— CNN, Neural Network, Emotion Detection, RNN, Face Recognition

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call