Traditional methods for diagnosing and treating depression often lack the precision to accurately identify the diverse symptoms and parameters associated with the disorder. With advancements in artificial intelligence and key technologies, multimodal emotion recognition has emerged as a promising approach to enhance depression detection. This paper explores the integration of physiological data such as EEG (electroencephalogram), ECG (electrocardiogram), and GSR (galvanic skin response), along with data from wearable devices, to develop a highquality feature extraction method for emotion analysis. Multiple datasets, including the SEED dataset involving EEG recordings from 15 participants during emotional video viewing, were utilized to collect comprehensive physiological and psychological data. Feature extraction involved preprocessing steps like noise reduction and normalization, followed by the application of decisionlevel and representationlevel fusion algorithms. Heart rate variability and skin conductance response features were analyzed to capture emotional indicators. An attentionbased Convolutional Neural Network (CNN) was employed to model both local and global facial features, and the Weighted Sum Pooling and Projection (WSPP) method was used for depression representation extraction. Multimodal fusion techniques, particularly modellevel fusion, significantly enhanced the learning of internal multimodal interactions, leading to improved recognition rates in Automatic Depression Estimation tasks compared to previous algorithms. The applications of this approach extend to smart home integration, daily emotion monitoring, health warning systems, and interactive wearable technologies, highlighting its potential for widespread use in enhancing user experience and mental health interventions.
Read full abstract