Correctness Comparison of ChatGPT‐4, Gemini, Claude‐3, and Copilot for Spatial Tasks

Hartwig H Hochmair,Takoda Kemp,Levente Juhász

doi:10.1111/tgis.13233

Abstract

ABSTRACTGenerative AI including large language models (LLMs) has recently gained significant interest in the geoscience community through its versatile task‐solving capabilities including programming, arithmetic reasoning, generation of sample data, time‐series forecasting, toponym recognition, or image classification. Existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero‐shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, that is, ChatGPT‐4, Gemini, Claude‐3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in the correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Correctness Comparison of ChatGPT‐4, Gemini, Claude‐3, and Copilot for Spatial Tasks

Abstract

Talk to us

Similar Papers

More From: Transactions in GIS

Lead the way for us

Journal: Transactions in GIS	Publication Date: Aug 12, 2024
Citations: 2

Similar Papers

LLMs and Spatial Reasoning: Assessing Roadblocks and Providing Pathways to Improvement
William Peng ... Sam Powers
Journal of Student Research | VOL. 13
William Peng, et. al.William Peng ... Sam Powers
31 May 2024
Journal of Student Research | VOL. 13

Large language models for biomedicine: foundations, opportunities, challenges, and best practices.
Satya S Sahoo ... Yanshan Wang
Journal of the American Medical Informatics Association : JAMIA | VOL. 31
Satya S Sahoo, et. al.Satya S Sahoo ... Yanshan Wang
24 Apr 2024
Journal of the American Medical Informatics Association : JAMIA | VOL. 31

Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis.
Mikaël Chelli ... Caroline Ruetsch-Chelli
Journal of medical Internet research | VOL. 26
Mikaël Chelli, et. al.Mikaël Chelli ... Caroline Ruetsch-Chelli
22 May 2024
Journal of medical Internet research | VOL. 26

When geoscience meets generative AI and large language models: Foundations, trends, and future challenges
Abdenour Hadid ... Tanujit Chakraborty
Expert Systems | VOL. 41
Abdenour Hadid, et. al.Abdenour Hadid ... Tanujit Chakraborty
11 Jun 2024
Expert Systems | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Correctness Comparison of ChatGPT‐4, Gemini, Claude‐3, and Copilot for Spatial Tasks

Abstract

Talk to us

Similar Papers

More From: Transactions in GIS