Artificial Intelligence in Medicine: A Comparative Study of ChatGPT and Google Bard in Clinical Diagnostics

Aso S Muhialdeen Aso S Muhialdeen,Shorsh A Mohammed Shorsh A Mohammed,Nahida Hama Ameen Ahmed Nahida Hama Ameen Ahmed,Shaho F Ahmed Shaho F Ahmed,Wriya N Hassan Wriya N Hassan,Hoshmand R Asaad Hoshmand R Asaad,Dana T Gharib Dana T Gharib,Huda M Muhammad Huda M Muhammad,Shko H Hassan Shko H Hassan,Karokh Fadhil Hama Hussein Karokh Fadhil Hama Hussein,Hemin S Mohammed Hemin S Mohammed,Abdulwahid M Salih Abdulwahid M Salih,Fahmi H Kakamad Fahmi H Kakamad,Muhammed Karim Muhammed Karim,Fakher Abdullah Fakher Abdullah,Hemn A Hassan Hemn A Hassan,Sasan M Ahmed Sasan M Ahmed,Suhaib H Kakamad Suhaib H Kakamad,Marwan N Hassan Marwan N Hassan,Shvan H Mohammed Shvan H Mohammed,Berun A Abdalla Berun A Abdalla

doi:10.58742/pry94q89

Aso S Muhialdeen Aso S Muhialdeen, Shorsh A Mohammed Shorsh A Mohammed + Show 19 more

Open Access

https://doi.org/10.58742/pry94q89

Copy DOI

Journal: Barw Medical Journal	Publication Date: Nov 6, 2023
License type: CC BY-NC-ND 4.0

Abstract

Introduction The introduction of Artificial Intelligence (AI) tools like ChatGPT and Google Bard promises transformative advances in clinical diagnostics. The aim of this study is to examine the ability of these two AI tools to diagnose various medical scenarios. Methods Experts from varied medical domains curated 20 case scenarios, each paired with its ideal diagnostic answer. Both AI systems, ChatGPT (updated in September 2021) and Google Bard (updated in January 2023), were tasked with diagnosing these cases. Their outcomes were recorded and subsequently assessed by human medical professionals. Results In the diagnostic evaluations, ChatGPT achieved an accuracy of 90%, correctly diagnosing 18 out of 20 cases, while Google Bard displayed an 80% accuracy rate, correctly answering 16 questions. Notably, both AIs faltered in specific complex scenarios. For instance, both systems misdiagnosed a labor situation, and while ChatGPT incorrectly identified a case of hypertrophic pyloric stenosis, Google Bard suggested a less suitable diagnostic procedure (pelvic ultrasound) for a 56-year-old patient. Conclusion This study showcases the promising capabilities of ChatGPT and Google Bard in the realm of clinical diagnostics, with both AI tools achieving commendable accuracy rates.

Full Text