Abstract
Abstract Objectives The aim of this study was to assess the completeness and readability of generative pre-trained transformer-4 (GPT-4)-generated discharge instructions at prespecified reading levels for common pediatric emergency room complaints. Materials and Methods The outputs for 6 discharge scenarios stratified by reading level (fifth or eighth grade) and language (English, Spanish) were generated fivefold using GPT-4. Specifically, 120 discharge instructions were produced and analyzed (6 scenarios: 60 in English, 60 in Spanish; 60 at a fifth-grade reading level, 60 at an eighth-grade reading level) and compared for completeness and readability (between language, between reading level, and stratified by group and reading level). Completeness was defined as the proportion of literature-derived key points included in discharge instructions. Readability was quantified using Flesch-Kincaid (English) and Fernandez-Huerta (Spanish) readability scores. Results English-language GPT-generated discharge instructions contained a significantly higher proportion of must-include discharge instructions than those in Spanish (English: mean (standard error of the mean) = 62% (3%), Spanish: 53% (3%), P = .02). In the fifth-grade and eighth-grade level conditions, there was no significant difference between English and Spanish outputs in completeness. Readability did not differ across languages. Discussion GPT-4 produced readable discharge instructions in English and Spanish while modulating document reading level. Discharge instructions in English tended to have higher completeness than those in Spanish. Conclusion Future research in prompt engineering and GPT-4 performance, both generally and in multiple languages, is needed to reduce potential for health disparities by language and reading level.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have