To evaluate and compare the readability and quality of patient information generated by Chat-Generative Pre-Trained Transformer-3.5 (ChatGPT) and the American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) using validated instruments including Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease, DISCERN, and Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P). ENTHealth.org and ChatGPT-3.5 were queried for patient information on laryngology topics. ChatGPT-3.5 was queried twice for a given topic to evaluate for reliability. This generated three de-identified text documents for each topic: one from AAO-HNS and two from ChatGPT (ChatGPT Output 1, ChatGPT Output 2). Grade level and reading ease were compared between the three sources using a one-way analysis of variance and Tukey's post hoc test. Independent t-tests were used to compare DISCERN and PEMAT understandability and actionability scores between AAO-HNS and ChatGPT Output 1. Material generated from ChatGPT Output 1 and ChatGPT Output 2 were at least two reading grade levels higher than that of material from AAO-HNS (p < 0.001). Regarding reading ease, ChatGPT Output 1 and ChatGPT Output 2 documents had significantly lower mean scores compared to AAO-HNS (p < 0.001). Moreover, ChatGPT Output 1 material on vocal cord paralysis had a lower PEMAT-P understandability compared to that of AAO-HNS material (p > 0.05). Patient information on the ENTHealth.org website for select laryngology topics was, on average, of a lower grade level and higher reading ease compared to that produced by ChatGPT, but interestingly with largely no difference in the quality of information provided. NA Laryngoscope, 2024.
Read full abstract