ABSTRACT Purpose Online health information seekers may access information produced by artificial intelligence language models such as ChatGPT (OpenAI). The medical field may pose a significant challenge for incorporating these applications given the training and experience needed to master clinical reasoning. The objective was to evaluate the performance of ChatGPT responses compared to human oculofacial plastic surgeon (OPS) responses to FAQs about an upper eyelid blepharoplasty procedure. Methods A cross-sectional survey was conducted. Three OPS trained by the American Society of Ophthalmic Plastic and Reconstructive Surgery (ASOPRS) and three ChatGPT instances each answered 6 frequently asked questions (FAQs) about an upper eyelid blepharoplasty procedure. Two blinded ASOPRS-trained OPS evaluated each response for their accuracy, comprehensiveness, and personal answer similarity based on a Likert scale (1=strongly disagree; 5=strongly agree). Results ChatGPT achieved a mean Likert scale score of 3.8 (SD 0.9) in accuracy, 3.6 (SD 1.1) in comprehensiveness, and 3.2 (SD 1.1) in personal answer similarity. In comparison, OPS achieved a mean score of 3.6 (SD 1.2) in accuracy (p = .72), 3.0 (SD 1.1) in comprehensiveness (p = .03), and 2.9 (SD 1.1) in personal answer similarity (p = .66). Conclusions ChatGPT was non-inferior to OPS in answering upper eyelid blepharoplasty FAQs. Compared to OPS, ChatGPT achieved better comprehensiveness ratings and non-inferior accuracy and personal answer similarity ratings. This study poses the potential for ChatGPT to serve as an adjunct to OPS for patient education but not a replacement. However, safeguards to protect patients from possible harm must be implemented.
Read full abstract