Abstract
Background: Artificial intelligence (AI) language models have emerged as tools capable of generating human-like text-to-user prompts on a wide range of topics. However, it is known that the performance and content quality of these language models have not been evaluated in certain medical fields. The aim of our study is to evaluate the performance of AI language models ChatGPT and Bard in providing parent information about hypospadias and compare the quality of the information with reference to a source. Methods: In this study, 38 frequently asked questions about hypospadias that were publicly posted on social media and websites by reputable institutions and societies were evaluated. The quality of responses was evaluated using the global quality score (GQS) with reference to the European Association of Urology guidelines. The number of words and sentences of the responses were recorded by the researchers. Results: In this study, it was found that the response quality of Bard was higher than ChatGPT in the question group related to the preoperative preparation category ( p = .042). The quality of ChatGPT and Bard's responses to the other questions, except for the questions in the preoperative preparation category, was similar and above average. It was found that the number of words ( p < .001) and sentences ( p < .001) of Bard's responses to the questions were higher than ChatGPT. Discussion: Large language models (LLMs) had limitations in providing parents with quality information about hypospadias. Parents should therefore be careful when using LLMs as a source of information.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have