Nowadays, the recognition of emotion using the speech signal has gained popularity because of its vast number of applications in different fields like medicine, online marketing, online search engines, the education system, criminal investigations, traffic collisions, etc. Many researchers have adopted different methodologies to improve emotion classification accuracy using speech signals. In our study, time–frequency (TF) analysis-based features were used to analyze the emotion classification performance. We used a novel TF analysis method called the chirplet transform (CT) to find the TF matrix of the speech signal. We then calculated the proposed TF-based permutation entropy (TFPE) feature using the TF matrix of the speech signal. To reduce the feature dimension and select the most informative emotional feature, we employed the genetic algorithm (GA) feature selection method. Then, the selected TFPE features are used as input to machine learning classifiers such as SVM, RF, DT, and KNN to classify the emotions in the speech signal. We obtained classification accuracy of 77.2%, 69.57%, 68.78%, 56.9%, and 99.1% for the EMO-DB, EMOVO, SAVEE, IEMOCAP, and TESS datasets without the GA feature selection method. The emotion classification accuracy increased to 85.6%, 78.33%, 77.76%, 63.15%, and 100% with the GA feature selection method. We compared our results with other methods and found that our method performed better in emotion classification than the state-of-the-art methods.
Read full abstract