Abstract

Abstract Introduction Since November 2022, Artificial Intelligence (AI) chatbots have grown in popularity including in urologic conditions. However, their accuracy and quality has not been evaluated systematically. In this study, we sought to assess the accuracy and quality of AI chatbots in the management of Erectile Dysfunction (ED). Objective We aim to evaluate the accuracy and quality of open-source language models in fielding common clinical questions pertaining to ED compared to board certified urologists. Methods Two AI open-source language models, ChatGPT and Google Bard, were fielded 15 standard questions related to ED some of which included causes, risk factors and treatment options of ED. Two board certified urologists were given the same questions on a standard survey. A third blinded board-certified urologist served as a grader using the AUA guidelines for ED on a Likert scale to assess the accuracy, robustness, and bias of each response. Urologist and AI responses were graded and aggregated using Likert scales. Results Overall AI responses were significantly more accurate (p<0.01), robust (p<0.01), and unbiased (p<0.01). Additionally, Google Bard had the highest scores all around followed by ChatGPT. The urologists’ responses were approximately 38% lower compared to the AI responses. Conclusions This study suggests that AI responses were superior compared to urologists in the areas of accuracy, robustness, and bias pertaining to the management of ED. Albeit, the chatbots have promising role in urologic conditions including ED, its widespread clinical use and adoption warrants further evaluation in the context of clinical decision making and enhancing patient care. Disclosure No.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call