Abstract

ChatGPT-4, BARD, and YOU.com are AI large language models (LLM) developed by OpenAI based on the GPT-3-4 architecture and Google. They were trained using unsupervised learning, which allows them to learn from vast amounts of text data without requiring explicit human labels. ChatGPT-4 was exposed to training information up to September 2021. By presenting prompts (queries) to ChatGPT-4, BARD, and YOU.com, including a typical case presentation (vignette) of a new patient with squamous cell tonsillar cancer, we uncovered several specific issues that raise concerns for the current application of this early phase of advanced LLM AI technology for clinical medicine. By prompting and comparing responses of three different LLMs (ChatGPT-4, BARD, and YOU.com) to identical prompts, we reveal several flaws in each AI that, if taken as factual, would affect clinical therapeutic suggestions and possible survival. The presented clinical vignette of a patient with newly diagnosed tonsillar cancer is presented to three LLMs readily available for free trial allowing comparison of results. We observed frequent changing responses to unchanging prompts over just hours and days within the same and between LLMs, critical errors of guideline-recommended drug therapy, and noted that several AI-supplied references presented by the AIs are bogus AI-generated references whose DOI and or PMID identifiers were either nonexistent or led to completely irrelevant manuscripts on other subjects.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call