Abstract
Depression has become a worldwide public health concern in recent years. The online social network can be served as a tool to predict latent depressed users. However, existing studies primarily use machine learning or deep learning methods on multi-modal information with simple concatenation, resulting in unsatisfactory prediction. Additionally, the lack of interpretability in extracted features is a challenge. Furthermore, the majority of these studies primarily concentrate on English social networks. To address these issues, we propose a novel unified multi-tasking training model named CrossAMF (Cross-modal Attention-based Multi-modal Fusion). CrossAMF extracts crucial multi-modal features with explainability and effectively fuses representations of different modalities. Our approach employs a sentiment filtering-based XLNet-CNN-BiGRU model and Reduce-VGGNet to extract textual and image features, respectively. We also introduce innovative user behavior metrics previously unexplored. We collect a dataset of Chinese depressed users from Sina microblog with their texts and images, including 5692 depressed users and 9121 normal users. Experimental results demonstrate the significance of each modality to our task, with the combination yielding the best performance. CrossAMF outperforms other state-of-the-art methods with 0.9097 in accuracy and 0.8915 in F1, showing the potential of application in screening out early depression automatically.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.