AI-Driven Health Advice: Evaluating the Potential of Large Language Models as Health Assistants

Yanlin Liu Yanlin Liu,Jiayi Wang Jiayi Wang

doi:10.62836/jcmea.v3i1.030106

Yanlin Liu Yanlin Liu, Jiayi Wang Jiayi Wang

https://doi.org/10.62836/jcmea.v3i1.030106

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This study aims to evaluate whether the GPT model can be a health assistant by addressing health concerns from three aspects: providing preliminary guidance, clarifying information, and offering accessible recommendations. 31 questions in total were collected from multiple online health platforms, which included diverse health concerns across different age ranges and genders. A tailored system prompt was built to guide GPT model GPT-3.5-turbo generating responses. The evaluation metrics are designed based on 3 metrics: “Preliminary Guidance”, “Clarifying Information”, and “Accessibility and Convenience”, which is used to evaluate responses with score method from 0 to 5. Lastly, the generated responses were evaluated using established metrics by an experienced medical doctor with over 20 years of experience in the fields of general and preventive care. The results indicate that LLMs demonstrated moderate performance in both the ‘preliminary guidance’ and ‘clarifying information’ aspects. Specifically, the mean score for ‘preliminary guidance’ was 3.65, implying that LLMs are capable of offering valuable insights when symptoms indicate the need for urgent or emergency care, as well as providing reassurance to patients for minor symptoms. In a similar manner, the mean score for ‘clarifying information’ was 3.87, demonstrating that LLMs effectively provide supplementary information to aid patients in making informed decisions. However, the mean score for ‘accessibility and convenience’ was notably lower at 2.65, highlighting a deficiency in LLMs’ ability to offer advice customized to the specific needs of individual patients.

Full Text