ChatGPT: friend or foe?

The Lancet Digital Health The Lancet Digital Health

doi:10.1016/s2589-7500(23)00023-7

Abstract

You would have been hard-pressed to miss the storm surrounding ChatGPT (Chat Generative Pre-trained Transformer) over the past few months. News outlets and social media have been abuzz with reports on the chatbot developed by OpenAI. In response to a written prompt, ChatGPT can compose emails, write computer code, and even craft movie scripts. Researchers have also demonstrated its competency to pass medical licensing exams. But excitement has been matched by a swathe of ethical concerns that could—and perhaps should—limit its adoption ChatGPT is powered by a refined version of the large language model (LLM) GPT-3.5. Its base model GPT-3 was trained on articles, websites, books, and written conversations, but a process of fine-tuning (including optimisation for dialogue) enables ChatGPT to respond to prompts in a conversational way. In the realm of health care, Sajan B Patel and Kyle Lam illustrated ChatGPT's ability to generate a patient discharge summary from a brief prompt. Automating this process could reduce delays in discharge from secondary care without compromising on detail, freeing up valuable time for doctors to invest in patient care and developmental training. A separate study also tested its ability to simplify radiology reports, with the generated reports being deemed overall factually correct, complete, and with low perceived risk of harm to patients. But in both cases, errors were evident. In the discharge summary example provided by Patel and Lam, ChatGPT added extra information to the summary that was not included in their prompt. Likewise, the radiology report study identified potentially harmful mistakes such as missing key medical findings. Such errors signal that if implemented in clinical practice, manual checks of automated outputs would be required. The limitations of ChatGPT are known. By OpenAI's own admission, ChatGPT's output can be incorrect or biased, such as citing article references that do not exist or perpetuating sexist stereotypes. It could also respond to harmful instructions, such as to generate malware. OpenAI set up guardrails to minimise the risks, but users have found ways around these, and as ChatGPT's outputs could be used to train future iterations of the model, these errors might be recycled and amplified. OpenAI have asked users to report inappropriate responses in order to help improve the model, but this has been met with criticism, as it's often people disproportionately affected by algorithmic bias (such as those from marginalised communities) who are expected to help find solutions. Michael Liebrenz and colleagues opine that although ChatGPT could serve to democratise knowledge sharing as it can receive and output text in multiple languages (beneficial for non-native speakers publishing in English), inaccuracies in generated text could fuel the spread of misinformation. These concerns have serious implications for the integrity of the scientific record, given the risk of introducing not only errors but also plagiarised content into publications. This could result in future research or health policy decisions being made on the basis of false information. Last month, the World Association of Medical Editors published its recommendations on the use of ChatGPT and other chatbots in scholarly publications, one of which is that journal editors need new tools to detect AI-generated or modified content. Indeed, an AI output detector was shown to be better at distinguishing between original and ChatGPT-generated research article abstracts than a plagiarism detector and human reviewers, but did falsely flag an original abstract as being “fake”. Technology is evolving, and editorial policies need to evolve too. Elsevier has introduced a new policy on the use of AI and AI-assisted technologies in scientific writing, stipulating that use should be limited to improving readability and language of the work, and should be declared in the manuscript; authors should do manual checks of any AI-generated output; and these tools should not be listed or cited as an author or co-author as they cannot take on the responsibilities that authorship entails (such as being accountable for the published work). Widespread use of ChatGPT is seemingly inevitable but in its current iteration careless, unchecked use could be a foe to both society and scholarly publishing. More forethought and oversight on model training are needed, as is investment in robust AI output detectors. ChatGPT is a game changer, but we're not quite ready to play.

Full Text