Abstract

Machine learning algorithms are often not able to recognize the speech emotion of the individuals. The Speech Emotion Recognition (SER) plays a major role in real-time applications that involve analyzing the speech emotions. It can be used in various scenarios such as emergency centers and human behavior assessments. In this work, we design the architecture for analyzing similarity in clusters, which is based on a key sequence selection procedure. A sequence of information is transformed into a spectrogram with the advantage of the STRFT algorithm. The subsequent result is a discriminative and salient feature extraction program. We have also added new features to the CNN to improve its recognition performance. Instead of the whole utterance, the key segments are processed separately to diminish the structure complexity. The proposed system is compared to different standard datasets for recognizing different kinds of objects. It is evaluated over different time periods and achieves better recognition accuracy. The proposed SER model is proven to be robust and reliable when compared with latest state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call