The performance of large language models on quantitative and verbal ability tests: Initial evidence and implications for unproctored high‐stakes testing

Louis Hickman,Jasper Leo Wolf,Patrick D Dunlop

doi:10.1111/ijsa.12479

Louis Hickman, Jasper Leo Wolf + Show 1 more

Open Access

https://doi.org/10.1111/ijsa.12479

Copy DOI

Abstract

AbstractUnproctored assessments are widely used in pre‐employment assessment. However, widely accessible large language models (LLMs) pose challenges for unproctored personnel assessments, given that applicants may use them to artificially inflate their scores beyond their true abilities. This may be particularly concerning in cognitive ability tests, which are widely used and traditionally considered to be less fakeable by humans than personality tests. Thus, this study compares the performance of LLMs on two common types of cognitive tests: quantitative ability (number series completion) and verbal ability (use a passage of text to determine whether a statement is true). The tests investigated are used in real‐world, high‐stakes selection. We also examine the performance of the LLMs across different test formats (i.e., open‐ended vs. multiple choice). Further, we contrast the performance of two LLMs (Generative Pretrained Transformers, GPT‐3.5 and GPT‐4) across multiple prompt approaches and “temperature” settings (i.e., a parameter that determines the amount of randomness in the model's output). We found that the LLMs performed well on the verbal ability test but extremely poorly on the quantitative ability test, even when accounting for the test format. GPT‐4 outperformed GPT‐3.5 across both types of tests. Notably, although prompt approaches and temperature settings did affect LLM test performance, those effects were mostly minor relative to differences across tests and language models. We provide recommendations for securing pre‐employment testing against LLM influences. Additionally, we call for rigorous research investigating the prevalence of LLM usage in pre‐employment testing as well as on how LLM usage affects selection test validity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The performance of large language models on quantitative and verbal ability tests: Initial evidence and implications for unproctored high‐stakes testing

Abstract

Talk to us

Similar Papers

More From: International Journal of Selection and Assessment

Lead the way for us

Journal: International Journal of Selection and Assessment	Publication Date: May 17, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

Evaluating the Performance of Large Language Models in Hematopoietic Stem Cell Transplantation Decision Making
Ivan Civettini ... Paola Perfetti
Blood | VOL. 142
Ivan Civettini, et. al.Ivan Civettini ... Paola Perfetti
02 Nov 2023
Blood | VOL. 142

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Performance of Large Language Models on a Neurology Board–Style Examination
Marc Cicero Schubert ... Varun Venkataramani
JAMA network open | VOL. 6
Marc Cicero Schubert, et. al.Marc Cicero Schubert ... Varun Venkataramani
07 Dec 2023
JAMA network open | VOL. 6

Jigsaw
Naman Jain ... Arun Iyer
-
Naman Jain, et. al.Naman Jain ... Arun Iyer
21 May 2022
21 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The performance of large language models on quantitative and verbal ability tests: Initial evidence and implications for unproctored high‐stakes testing

Abstract

Talk to us

Similar Papers

More From: International Journal of Selection and Assessment