Emotion recognition from speech signals is an important and challenging component of Human-Computer Interaction. In the field of speech emotion recognition (SER), many techniques have been utilized to extract emotions from speech signals, including many well-established speech analysis and classification techniques. This model can be built by using various methods such as RNN, SVM, deep learning, cepstral coefficients, and various other methods, out of which SVM normally gives us the highest accuracy. We propose a model that can identify emotions present in the speech, which can be identified by various parameters such as pitch, speaking rate, speech time, and frequency patterns. Emotion detection in digitized speech contains 3 components: Signal processing, Feature extraction, and Classification. The model first tries to remove the background noises then extract the features present in the speech and classify it into a single emotion. This model is capable of identifying seven different emotions that can be found in human speech. We can use different classifiers like GMM and HMM to classify features such as Spectral Subtraction, Wiener Filtering, Adaptive Filtering, and Deep Learning Techniques. This model can be used in various fields such as healthcare, security, psychology, medicine, education, and entertainment.
Read full abstract