Depression is one of the most prevalent mental conditions which could impair people's productivity and lead to severe consequences. The diagnosis of this disease is complex as it often relies on a physician's subjective interview-based screening. The aim of our work is to propose deep learning models for automatic depression detection by using different data modalities, which could assist in the diagnosis of depression. Current works on automatic depression detection mostly are tested on a single dataset, which might lack robustness, flexibility and scalability. To alleviate this problem, we design a novel Graph Neural Network-enhanced Transformer model named DePressionDetect Net (DPD Net) that leverages textual, audio and visual features and can work under two different application settings: the clinical setting and the social media setting. The model consists of a unimodal encoder module for encoding single modality, a multimodal encoder module for integrating the multimodal information, and a detection module for producing the final prediction. We also propose a model named DePressionDetect-with-EEG Net (DPD-E Net) to incorporate Electroencephalography (EEG) signals and speech data for depression detection. Experiments across four benchmark datasets show that DPD Net and DPD-E Net can outperform the state-of-the-art models on three datasets (i.e., E-DAIC dataset, Twitter depression dataset and MODMA dataset), and achieve competitive performance on the fourth one (i.e., D-vlog dataset). Ablation studies demonstrate the advantages of the proposed modules and the effectiveness of combining diverse modalities for automatic depression detection.
Read full abstract