Abstract

As a challenge to refine the spontaneity and productivity of a machine and human coherence, speech emotion recognition has been an overriding area of research. The trustability and fulfillment of emotion recognition are largely involved with the feature extraction and selection processes. An important role is played in exploring and distinguishing audio content during the feature extraction phase. Also, the features that have been extracted should be resilient to a number of disturbances and reliable enough for an adequate classification system. This article focuses on three main components of a Speech Emotion Recognition (SER) process. The first one is the optimal feature extraction method for a Punjabi SER system. The second one is the use of an appropriate feature selection method that selects effectual features from the ones extracted in the first step and removes the redundant features to improve the conduct of emotion recognition. The third one is the classification model that has been used further for emotion recognition. So the scope of this article is to explain the three main steps of the Punjabi SER system: feature extraction, feature selection, and emotion recognition with classifier. The results have been calculated and compared for number of feature set combinations, with and without a feature selection process. A total of 10 experiments are carried out, and various performance metrics such as precision, recall, F1-score, accuracy, and so on, are used to demonstrate the results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call