Mental-LLM

Xuhai Xu,Hong Yu,Bingsheng Yao,Marzyeh Ghassemi,Yuanzhe Dong,Saadia Gabriel,James Hendler,Anind K Dey,Dakuo Wang

doi:10.1145/3643540

Abstract

Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mental-LLM

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Lead the way for us

Journal: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies	Publication Date: Mar 6, 2024
Citations: 17

Similar Papers

The Opportunities and Risks of Large Language Models in Mental Health.
Hannah R Lawrence ... Megan Jones Bell
JMIR mental health | VOL. 11
Hannah R Lawrence, et. al.Hannah R Lawrence ... Megan Jones Bell
29 Jul 2024
JMIR mental health | VOL. 11

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Bias of AI-generated content: an examination of news produced by large language models
Xiao Fang ... Xiaohang Zhao
Scientific Reports | VOL. 14
Xiao Fang, et. al.Xiao Fang ... Xiaohang Zhao
04 Mar 2024
Scientific Reports | VOL. 14

Jigsaw
Naman Jain ... Arun Iyer
-
Naman Jain, et. al.Naman Jain ... Arun Iyer
21 May 2022
21 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mental-LLM

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies