Unlocking the future of patient Education: ChatGPT vs. LexiComp® as sources of patient education materials

Elizabeth W Covington,Amber M Hutchison,Jeanna Sewell,Melanie Hyte,Lucy Tocco,Julie Kay,Courtney S Watts Alexander

doi:10.1016/j.japh.2024.102119

Abstract

BackgroundChatGPT is a conversational artificial intelligence technology that has shown application in various facets of healthcare. With the increased use of AI, it is imperative to assess the accuracy and comprehensibility of AI platforms. ObjectiveThis pilot project aimed to assess the understandability, readability, and accuracy of ChatGPT as a source of medication-related patient education as compared with an evidence-based medicine tertiary reference resource, LexiComp®. MethodsPatient education materials (PEMs) were obtained from ChatGPT and LexiComp® for 8 common medications (albuterol, apixaban, atorvastatin, hydrocodone/acetaminophen, insulin glargine, levofloxacin, omeprazole, and sacubitril/valsartan). PEMs were extracted, blinded, and assessed by 2 investigators independently. The primary outcome was a comparison of the Patient Education Materials Assessment Tool-printable (PEMAT-P). Secondary outcomes included Flesch reading ease, Flesch Kincaid grade level, percent passive sentences, word count, and accuracy. A 7-item accuracy checklist for each medication was generated by expert consensus among pharmacist investigators, with LexiComp® PEMs serving as the control. PEMAT-P interrater reliability was determined via intraclass correlation coefficient (ICC). Flesch reading ease, Flesch Kincaid grade level, percent passive sentences, and word count were calculated by Microsoft® Word®. Continuous data were assessed using the Student’s t-test via SPSS (version 20.0). ResultsNo difference was found in the PEMAT-P understandability score of PEMs produced by ChatGPT versus LexiComp® [77.9% (11.0) vs. 72.5% (2.4), P=0.193]. Reading level was higher with ChatGPT [8.6 (1.2) vs. 5.6 (0.3), P < 0.001). ChatGPT PEMs had a lower percentage of passive sentences and lower word count. The average accuracy score of ChatGPT PEMs was 4.25/7 (61%), with scores ranging from 29% to 86%. ConclusionDespite comparable PEMAT-P scores, ChatGPT PEMs did not meet grade level targets. Lower word count and passive text with ChatGPT PEMs could benefit patients, but the variable accuracy scores prevent routine use of ChatGPT to produce medication-related PEMs at this time.

Full Text