Virtual human speech emotion recognition based on multi-channel CNN: MFCC, LPC, and F0 features

Liwen Ke

doi:10.1088/1742-6596/2664/1/012011

Abstract

Convolutional neural networks (CNNs) have shown promise in virtual human speech emotion expression. Previous studies have utilized CNNs for speech emotion recognition and achieved good results. However, there are research gaps in avatar speech emotion expression, particularly concerning speaker characteristics, and limited available datasets. To address these issues, this paper collects and pre-processes speech data from multiple speakers, using features such as Mel Frequency Cepstral Coefficient (MFCC) and Linear Predictive Coding (LPC). A multi-channel CNN (MUC-CNN) model is designed to fuse different feature information and update model parameters using the Adam optimization algorithm. The model‘s performance is compared with classical methods like Support Vector Machine (SVM), Random Forest (RF), and k-Nearest Neighbors (k-NN) to determine its applicability and optimize its design and training process. Experimental evaluation shows that the MUC-CNN model outperforms classical methods in recognizing and expressing emotions in virtual human speech. By incorporating MFCC, LPC, and F0 features, the model‘s recognition capabilities are improved. The multi-channel architecture allows independent processing of each feature type, enhancing the model‘s discriminative aptitude. The performance of the model is influenced by the quantity of convolutional layers and kernels utilized. The outcomes highlight the effectiveness of the proposed MUC-CNN model for the recognition and expression of speech emotions in virtual human interactions. Future research can explore alternative feature information and refine the model architecture to further optimize performance. This technology has the potential to enhance user experience and interaction in various fields, including speech interaction, virtual reality, games, and education.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Virtual human speech emotion recognition based on multi-channel CNN: MFCC, LPC, and F0 features

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Journal: Journal of Physics: Conference Series	Publication Date: Dec 1, 2023
License type: cc-by

Similar Papers

Performance Analysis of LPC and MFCC Features in Voice Conversion Using Artificial Neural Networks
Shashidhar G Koolagudi ... Yarlagadda V S Murthy
-
Shashidhar G Koolagudi, et. al.Shashidhar G Koolagudi ... Yarlagadda V S Murthy
25 Aug 2016
25 Aug 2016

Implicit language identification system based on random forest and support vector machine for speech
Manish Gupta ... Suneeta Agarwal
-
Manish Gupta, et. al.Manish Gupta ... Suneeta Agarwal
01 Mar 2017
01 Mar 2017

Deep Learning Based Emotion Classification Using Mel Frequency Magnitude Coefficient
Siba Prasad Mishra ... Suman Deb
-
Siba Prasad Mishra, et. al.Siba Prasad Mishra ... Suman Deb
04 Mar 2023
04 Mar 2023

Speaker Recognition using fusion of features with Feedforward Artificial Neural Network and Support Vector Machine
Neha Chauhan ... Dongju Li
-
Neha Chauhan, et. al.Neha Chauhan ... Dongju Li
01 Jun 2020
01 Jun 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Virtual human speech emotion recognition based on multi-channel CNN: MFCC, LPC, and F0 features

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series