Abstract

Currently, emotional features in speech emotion recognition are typically extracted from the speeches, However, recognition accuracy can be influenced by factors such as semantics, language, and cross-speech datasets. Achieving consistent emotional judgment with human listeners is a key challenge for AI to address. Electroencephalography (EEG) signals prove to be an effective means of capturing authentic and meaningful emotional information in humans. This positions EEG as a promising tool for detecting emotional cues conveyed in speech. In this study, we proposed a novel approach named CS-GAN that generates listener EEGs in response to a speaker's speech, specifically aimed at enhancing cross-subject emotion recognition. We utilized generative adversarial networks (GANs) to establish a mapping relationship between speech and EEGs to generate stimulus-induced EEGs. Furthermore, we integrated compressive sensing theory (CS) into the GAN-based EEG generation method, thereby enhancing the fidelity and diversity of the generated EEGs. The generated EEGs were then processed using a CNN-LSTM model to identify the emotional categories conveyed in the speech. By averaging these EEGs, we obtained the event-related potentials (ERPs) to improve the cross-subject capability of the method. The experimental results demonstrate that the generated EEGs by this method outperform real listener EEGs by 9.31% in cross-subject emotion recognition tasks. Furthermore, the ERPs show an improvement of 43.59%, providing evidence for the effectiveness of this method in cross-subject emotion recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call