Abstract
Computer emotion recognition plays an important role in the field of artificial intelligence and is a key technology to realize human-machine interaction. Aiming at a cross-modal fusion problem of two nonlinear features of facial expression image and speech emotion, a bimodal fusion emotion recognition model (D-CNN) based on convolutional neural network is proposed. Firstly, a fine-grained feature extraction method based on convolutional neural network is proposed. Secondly, in order to obtain joint features representation, a feature fusion method based on the fine-grained features of bimodal is proposed. Finally, in order to verify the performance of the D-CNN model, experiments were conducted on the open source dataset eNTERFACE'05. The experimental results show that the multi-modal emotion recognition model D-CNN is more than 10% higher than the single emotion recognition model of speech and facial expression respectively. In addition, compared with the other commonly used bimodal emotion recognition methods(such as universal background model), the recognition rete of D-CNN is increased by 5%.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have