Speech Signal Imaging and Emotion Recognition Based on Symmetric-Diagonal Matrix Model

Seiichi Serikawa,Yuting Wang,Yaoyao Chen,Aoran Xi,Zijun Yang,Shi Zhou

doi:10.12792/iciae2023.045

Abstract

Speaking is the main medium of communication in our daily life. It can convey the thought and express the emotional state between humans. The goal of speech emotion recognition is to recognize human emotional states from speaking. The speech emotion recognition mainly includes two steps: feature extraction and classifier construction. Taking speech features usually refers to its spectral features. The speech signal input is an approximately continuous value, and the spectral features contain considerable number of information, such as speech content, rhythm, tone, intonation, and so on. However, related with the emotion speech feature extraction is still an immature research direction. Influenced by the success of the computer vision, the visualization of speech signals has become a new method for analysis emotion recognition on the acoustic features of speech. Based on the Graham Angle Field Method, this research using a variety of neural network models to extract speech feature values and recognizing speech emotions. According to the experiments, this research find that it is feasible to visualize speech signals and use the obtained results for emotion recognition. In the future, we will further optimize the network model. Combing with other acoustic features, such as speech content, rhythm. etc., completing speech emotion recognition in our real life.

Full Text