Background/Objectives: The evolving capabilities of large language models, such as generative pre-trained transformers (ChatGPT), offer new avenues for disseminating health information online. These models, trained on extensive datasets, are designed to deliver customized responses to user queries. However, as these outputs are unsupervised, understanding their quality and accuracy is essential to gauge their reliability for potential applications in healthcare. This study evaluates responses generated by ChatGPT addressing common patient concerns and questions about cleft lip repair. Methods: Ten commonly asked questions about cleft lip repair procedures were selected from the American Society of Plastic Surgeons’ patient information resources. These questions were input as ChatGPT prompts and five board-certified plastic surgeons assessed the generated responses on quality of content, clarity, relevance, and trustworthiness, using a 4-point Likert scale. Readability was evaluated using the Flesch reading ease score (FRES) and the Flesch–Kincaid grade level (FKGL). Results: ChatGPT responses scored an aggregated mean rating of 2.9 out of 4 across all evaluation criteria. Clarity and content quality received the highest ratings (3.1 ± 0.6), while trustworthiness had the lowest rating (2.7 ± 0.6). Readability metrics revealed a mean FRES of 44.35 and a FKGL of 10.87, corresponding to approximately a 10th-grade literacy standard. None of the responses contained grossly inaccurate or potentially harmful medical information but lacked citations. Conclusions: ChatGPT demonstrates potential as a supplementary tool for patient education in cleft lip management by delivering generally accurate, relevant, and understandable information. Despite the value that AI-powered tools can provide to clinicians and patients, the lack of human oversight underscores the importance of user awareness regarding its limitations.
Read full abstract