Abstract

Emotions are a natural component of life, and the human brain's central nervous system regulates their generations and expressions. The capacity of computers to identify and replicate human emotions can be improved by studying the brain mechanisms underlying human emotion display. Audio, which is currently one of the most popular types of multimedia stimulation, may communicate a wide range of emotional meanings due to its aural characteristics. In addition to audio features, EEG features can provide useful novelty for emotion recognition, as these features are the most realistic feedback on human emotion perception. This paper constructed a fused dataset of EEG and audio features based on the SEED EEG dataset. A deep learning model based on Long-Short Term Memory Network (LSTM) is utilized to find the best model for audio emotion identification using the bimodal dataset. We discovered that combining the full-band EEG power spectral density and audio fusion characteristics resulted in the best recognition. Simultaneously, We extracted features from each original stimulus audio and added audio characteristics from each period to examine the influence of emotion recognition and investigate the inherent link between the creation of EEG signals and the original stimulus audio in-depth. The study shows that the method can be used as a potential video emotion indexing method for video information retrieval.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call