Leveraging Domain Knowledge to Improve Depression Detection on Chinese Social Media

Zhihua Guo,Minyu Zhai,Nengneng Ding,Zhenwen Zhang,Zepeng Li

doi:10.1109/tcss.2023.3267183

Abstract

Depression is a prevalent and severe mental disorder that often goes undetected and untreated, particularly in its early stages. However, social media has emerged as a valuable resource for identifying symptoms of depression and other mental disorders as people are increasingly willing to share their experiences and emotions online. As such, social media-based depression detection has become an important area of research. Unfortunately, despite the growing number of cases in China, there are few Chinese social media-based resources for depression research. To address this gap, this article presents a dataset collected from Sina Weibo and approaches depression detection as a binary classification problem. A depression lexicon is developed based on domain knowledge of depression and the Dalian University of Technology Sentiment Lexicon (DUT-SL), which facilitates better extraction of lexical features related to depression. Then the lexical features are fused using a correlation-based metric. The effectiveness of this approach is verified using five classical machine learning methods and two boosting-based models, both on a public dataset and our dataset. Experimental results indicate that the depression domain lexicon features improve classification performance and fusing these features based on their correlations can further enhance prediction effectiveness. This study provides a method for future research in social media-based depression detection and contributes to the development of Chinese depression detection resources.

Full Text