Emotion recognition is the key to making machines more intelligent. This study proposes a novel sing-link end-to-end spatio-temporal demographic network (SSTD) that fuses spatial, temporal, and demographic information for electroencephalography (EEG)-based emotion recognition. In the SSTD model, an adaptive time window using single-link hierarchical clustering based on Riemannian metrics was realized for data preprocessing to solve the problem of individual differences. Then, the preprocessed EEG data acted as a gate recurrent unit (GRU) network input to calculate high-level time-domain features. At the same time, the EEG covariance matrices were fed into the symmetric positive definite matrix network (SPDNet) to calculate high-level spatial features. Given the correlation between EEG signals and individual demographic information, gender and age factors were integrated into the spatio-temporal model, resulting in more effective high-level features for EEG-based emotion recognition. Finally, extensive comparative experiments were conducted on two public datasets: DEAP and DREAMER. The average accuracy of valence and arousal on the DEAP dataset are 68.28% and 71.48%, respectively. The average accuracy of valence and arousal on the DREAMER dataset are 76.81% and 81.64%, respectively. Experimental results show that the SSTD model has an excellent recognition performance.
Read full abstract