Suicide is a global public health problem that takes hundreds of thousands of lives each year. The key to effective suicide prevention is early detection of suicidal ideations and timely intervention. However, several factors hinder traditional suicide risk screening methods. Primarily, the social stigma associated with suicide presents a challenge to suicidal ideation detection, as existing methods require patients to explicitly communicate their suicidal propensities. In contrast, progressively more at-risk people choose online platforms—such as Reddit—as their preferred avenues for sharing their suicidal experiences and seeking emotional support. As a result, these online platforms have become an unobtrusive source of user-generated textual data that can be used to detect suicidality with supervised machine learning and natural language processing techniques. In this paper, we proposed a suicidal ideation detection approach that combines textual and psycholinguistic features extracted from the Reddit forum. Subsequently, we selected the most informative features using the Boruta algorithm and employed four classifiers: logistic regression, naïve Bayes, support vector machines, and random forest. The naïve Bayes models trained with the combination of term frequency-inverse document frequency (TF-IDF) and National Research Council (NRC) features demonstrated the highest performance, obtaining a F1 score of 70.99%. Our experimental results illustrate that a combination of textual and psycholinguistic features yields better classification performance compared to using those features separately.
Read full abstract