Semisupervised Deep Features of Time-Frequency Maps for Multimodal Emotion Recognition

Hashem Kalbkhani,Behrooz Zali-Vargahan,Asghar Charmin,Gennaro Vessio,Saeed Barghandan

doi:10.1155/2023/3608115

Abstract

Traditional approaches for emotion recognition utilize unimodal physiological signals. The effectiveness of such systems is affected by some limitations. To overcome them, this paper proposes a new method based on time-frequency maps that extract the features from multimodal biological signals. At first, the fusion of electroencephalogram (EEG) and peripheral physiological signal (PPS) is performed, and then, the two-dimensional discrete orthonormal Stockwell transform (2D-DOST) of the multimodal signal matrix is calculated to obtain time-frequency maps. A convolutional neural network (CNN) is then utilized to extract the local deep features from the absolute output of the 2D-DOST. Since there are uninformative deep features, the semisupervised dimension reduction scheme reduces them by balancing the generalization and discrimination. Finally, the classifier recognizes the emotion. The Bayesian optimizer finds the proper SSDR and classifier parameter values to maximize the recognition accuracy. The performance of the proposed method is evaluated on the DEAP dataset considering the two- and four-class scenarios through extensive simulations. This dataset consists of electroencephalograph (EEG) signals in 32 channels and peripheral physiological signals (PPSs) in eight channels from 32 subjects. The proposed method reaches the accuracy of 0.953 and 0.928 for two- and four-class scenarios, respectively. The results indicate the efficiency of the multimodal signals for detecting emotions compared to that of unimodal signals. Also, the results indicate that the proposed method outperforms the recently introduced ones.

Full Text