Abstract

BackgroundMental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data that can be mined to predict mental health states using machine learning methods.ObjectiveThis study aimed to benchmark multiple methods of text feature representation for social media posts and compare their downstream use with automated machine learning (AutoML) tools. We tested on datasets that contain posts labeled for perceived suicide risk or moderator attention in the context of self-harm. Specifically, we assessed the ability of the methods to prioritize posts that a moderator would identify for immediate response.MethodsWe used 1588 labeled posts from the Computational Linguistics and Clinical Psychology (CLPsych) 2017 shared task collected from the Reachout.com forum. Posts were represented using lexicon-based tools, including Valence Aware Dictionary and sEntiment Reasoner, Empath, and Linguistic Inquiry and Word Count, and also using pretrained artificial neural network models, including DeepMoji, Universal Sentence Encoder, and Generative Pretrained Transformer-1 (GPT-1). We used Tree-based Optimization Tool and Auto-Sklearn as AutoML tools to generate classifiers to triage the posts.ResultsThe top-performing system used features derived from the GPT-1 model, which was fine-tuned on over 150,000 unlabeled posts from Reachout.com. Our top system had a macroaveraged F1 score of 0.572, providing a new state-of-the-art result on the CLPsych 2017 task. This was achieved without additional information from metadata or preceding posts. Error analyses revealed that this top system often misses expressions of hopelessness. In addition, we have presented visualizations that aid in the understanding of the learned classifiers.ConclusionsIn this study, we found that transfer learning is an effective strategy for predicting risk with relatively little labeled data and noted that fine-tuning of pretrained language models provides further gains when large amounts of unlabeled text are available.

Highlights

  • Mental health disorders are highly prevalent, with epidemiological studies reporting roughly half the population in the United States meeting the criteria for one or more mental disorders in their lifetime and roughly a quarter meeting the criteria in a given year [1]

  • The top-performing system used features derived from the Generative Pretrained Transformer-1 (GPT-1) model, which was fine-tuned on over 150,000 unlabeled posts from Reachout.com

  • Mental disorders are among the strongest predictors for nonsuicidal self-injury and suicidal behaviors; little is known about how people transition from suicidal thoughts to attempts [5]

Read more

Summary

Introduction

Mental health disorders are highly prevalent, with epidemiological studies reporting roughly half the population in the United States meeting the criteria for one or more mental disorders in their lifetime and roughly a quarter meeting the criteria in a given year [1]. Franklin et al [6] report a lack of progress over the last 50 years on the identification of risk factors that can aid in the prediction of suicidal thoughts and behaviors. They proposed that new methods with a focus on risk algorithms using machine learning present an ideal path forward. These approaches can be integrated into peer support forums to develop repeated and continuous measurements of a user’s well-being to inform early interventions. Online mental health forums can provide a supportive environment for those afflicted and generate a large amount of data that can be mined to predict mental health states using machine learning methods

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call