Abstract

Classification is the last, and usually the most time-consuming step in recognition. Most recently proposed classification algorithms have adopted machine learning (ML) as the main classification approach, regardless of time consumption. This study proposes a statistical feature classification cubic spline interpolation (FC-CSI) algorithm to classify emotions in speech using a curve fitting technique. FC-CSI is utilized in a speech emotion recognition system (SERS). The idea is to sketch the cubic spline interpolation (CSI) for each audio file in a dataset and the mean cubic spline interpolations (MCSIs) representing each emotion in the dataset. CSI interpolation is generated by connecting the features extracted from each file in the feature extraction phase. The MCSI is generated by connecting the mean features of 70% of the files of each emotion in the dataset. Points on the CSI are considered the new generated features. To classify each audio file according to emotion, the Euclidian distance (ED) is found between each CSI and all MCSIs of all emotions in the dataset. Each audio file is classified according to the nearest MCSI to the CSI representing it. The three datasets used in this work are Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Berlin (Emo-DB), and Surrey Audio-Visual Expressed Emotion (SAVEE). The proposed work shows fast classification and high accuracy of results. The classification accuracy, i.e., the proportion of samples assigned to the correct class, using FC-CSI without feature selection (FS), was 69.08%, 92.52%, and 89.1% with RAVDESS, Emo-DB, and SAVEE, respectively. The results of the proposed method were compared to those of a designed neural network called SER-NN. Comparisons were made with and without FS. FC-CSI outperformed SER-NN on Emo-DB and SAVEE, and underperformed on RAVDESS, without using an FS algorithm. It was noticed from experiments that FC-CSI operated faster than the same system utilizing SER-NN.

Highlights

  • Numeric data are often difficult to analyze, and functions to link the data are hard to find

  • 3.1 Experiment 1.1: speech emotion recognition system (SERS) Performance Analysis Before Deploying Feature Selection Algorithm Through Exp1.1, the classification performance of SERS is calculated in two different approaches

  • Tab. 3 shows the results of Exp1.1 implemented on the 2182 feature extraction mean deviation (FE-MD) features extracted from the RAVDESS, EMO-DB, and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, without feature selection

Read more

Summary

Introduction

Numeric data are often difficult to analyze, and functions to link the data are hard to find. In 1998, cubic spline interpolation (CSI) was proposed, which connects pairs of data points using unique cubic polynomials, generating a continuous and smooth curve [1]. The fundamental concept of CSI is based on the engineer’s tool used to draw smooth curves through several points. The mathematical spline is similar in principle The points, in this case, are numeric data. The weights are the coefficients of cubic polynomials used to interpolate the data. These coefficients “bend” a line so that it passes through each data point without erratic behavior or breaks in continuity [1]. A planning path method based on cubic spline interpolation was proposed to smooth a robot’s moving path [6]. Biodiesel production from waste cooking oil was optimized using CSI and response surface methodology in a mathematical model [7]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call