Abstract

BackgroundIn this study, we evaluated the diagnostic accuracy of Google Bard, a generative artificial intelligence (AI) platform. MethodsWe searched published case reports from our department for difficult or uncommon case descriptions and mock cases created by physicians for common case descriptions. We entered the case descriptions into the prompt of Google Bard to generate the top 10 differential-diagnosis lists. As in previous studies, other physicians created differential-diagnosis lists by reading the same clinical descriptions. ResultsA total of 82 clinical descriptions (52 case reports and 30 mock cases) were used. The accuracy rates of physicians were still higher than Google Bard in the top 10 (56.1% vs 82.9%, P < .001), the top 5 (53.7% vs 78.0%, P = .002), and the top differential diagnosis (40.2% vs 64.6%, P = .003). Even within the specific context of case reports, physicians consistently outperformed Google Bard. When it came to mock cases, the performances of the differential-diagnosis lists by Google Bard were no different from those of the physicians in the top 10 (80.0% vs 96.6%, P = .11) and the top 5 (76.7% vs 96.6%, P = .06), except for those in the top diagnoses (60.0% vs 90.0%, P = .02). ConclusionWhile physicians excelled overall, and particularly with case reports, Google Bard displayed comparable diagnostic performance in common cases. This suggested that Google Bard possesses room for further improvement and refinement in its diagnostic capabilities. Generative AIs, including Google Bard, are anticipated to become increasingly beneficial in augmenting diagnostic accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call