Abstract
Machine learning algorithms are often not able to recognize the speech emotion of the individuals. The Speech Emotion Recognition (SER) plays a major role in real-time applications that involve analyzing the speech emotions. It can be used in various scenarios such as emergency centers and human behavior assessments. In this work, we design the architecture for analyzing similarity in clusters, which is based on a key sequence selection procedure. A sequence of information is transformed into a spectrogram with the advantage of the STRFT algorithm. The subsequent result is a discriminative and salient feature extraction program. We have also added new features to the CNN to improve its recognition performance. Instead of the whole utterance, the key segments are processed separately to diminish the structure complexity. The proposed system is compared to different standard datasets for recognizing different kinds of objects. It is evaluated over different time periods and achieves better recognition accuracy. The proposed SER model is proven to be robust and reliable when compared with latest state-of-the-art methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.