Abstract

AbstractThe significant role of emotion in human daily interaction cannot be over-emphasized, however, the pressing demand for a cutting-edge and highly efficient model for the classification of speech emotion in effective computing has remained a challenging task. Researchers have proposed several approaches for speech emotion classification (SEC) in recent times, but the lingering challenge of the insufficient dataset, which has been limiting the performances of these approaches, is still of major concern. Therefore, this work proposes a deep transfer learning model, a technique that has been yielding tremendous and state-of-the-art results in computer vision, for SEC. Our approach used a pre-trained and optimized model of Visual Geometry Group (VGGNet) convolutional neural network architecture with appropriate fine-tuning for optimal performance. The speech signal is converted to a mel-Spectrogram image suitable for deep learning model input (224\(\,\times \,\)244 x 3) using filterbanks and Fast Fourier transform (FFT) on the speech samples. Multi-layer perceptron (MLP) algorithm is adopted as a classifier after feature extraction is carried out by the deep learning model. Speech pre-processing was carried out on Toronto English Speech Set (TESS) speech emotional corpus used for the study to prevent the low performance of our proposed model. The result of our experiment after evaluation using the TESS dataset shows an improved result in SEC with an accuracy rate of 96.1% and specificity of 97.4%.KeywordsDeep learningSpeech emotionClassificationDeep convolutional neural network

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.