The hallucinating chatbot ‘ChatGPT’ poorly estimates real bird commonness

Michał Żmihorski

doi:10.1016/j.biocon.2023.110371

Abstract

Recent advances in artificial intelligence have led to the development of increasingly sophisticated chatbot technologies, with ChatGPT, developed by OpenAI, gaining significant popularity. As public awareness of environmental issues has grown during recent decades, there is an increasing demand for access to information about the environment. ChatGPT which offers free, real-time information via a user-friendly interface, has the potential to fill this gap. However, empirical evaluations of reliability and quality of information provided by chatbots are needed. In this study, the biological information provided by GPT-3.5 is evaluated. The commonness indices of 199 bird species in Poland, estimated by the ChatGPT, are correlated with the real commonness indices obtained during ornithological survey conducted across 700 1 × 1 km squares. Bird commonness indices provided by ChatGPT were generally positively correlated (r ≤ 0.6) with bird abundance and occurrence. However, this correlation was not particularly strong. Correlation was especially poor for less common species (i.e., below 1000 individuals recorded during bird monitoring), and some of rare or very rare species were incorrectly identified by ChatGPT as relatively common. Moreover, ChatGPT seemed confident, realistic and persuasive in providing obviously incorrect responses (possibly due to lack of training data), much like it would for correct responses. Generating such false responses due to data deficiency is commonly referred to as “hallucination” of a chatbot. While chatbots can potentially be powerful tools for delivering environmental information, better control over the training process is necessary.

Full Text