Automatic depression prediction via cross-modal attention-based multi-modal fusion in social networks

Lidong Wang,Yin Zhang,Bin Zhou,Shihua Cao,Keyong Hu,Yunfei Tan

doi:10.1016/j.compeleceng.2024.109413

Abstract

Depression has become a worldwide public health concern in recent years. The online social network can be served as a tool to predict latent depressed users. However, existing studies primarily use machine learning or deep learning methods on multi-modal information with simple concatenation, resulting in unsatisfactory prediction. Additionally, the lack of interpretability in extracted features is a challenge. Furthermore, the majority of these studies primarily concentrate on English social networks. To address these issues, we propose a novel unified multi-tasking training model named CrossAMF (Cross-modal Attention-based Multi-modal Fusion). CrossAMF extracts crucial multi-modal features with explainability and effectively fuses representations of different modalities. Our approach employs a sentiment filtering-based XLNet-CNN-BiGRU model and Reduce-VGGNet to extract textual and image features, respectively. We also introduce innovative user behavior metrics previously unexplored. We collect a dataset of Chinese depressed users from Sina microblog with their texts and images, including 5692 depressed users and 9121 normal users. Experimental results demonstrate the significance of each modality to our task, with the combination yielding the best performance. CrossAMF outperforms other state-of-the-art methods with 0.9097 in accuracy and 0.8915 in F1, showing the potential of application in screening out early depression automatically.

Full Text