Case study identification with GPT-4 and implications for mapping studies

Kai Petersen

doi:10.1016/j.infsof.2024.107452

Abstract

Context:Rainer and Wohlin showed that case studies are not well understood by reviewers and authors and thus they say that a given research is a case study when it is not. Objective:Rainer and Wohlin proposed a smell indicator (inspired by code smells) to identify case studies based on the frequency of occurrences of words, which performed better than human classifiers. With the emergence of ChatGPT, we evaluate ChatGPT to assess its performance in accurately identifying case studies. We also reflect on the results’ implications for mapping studies, specifically data extraction. Method:We used ChatGPT with the model GPT-4 to identify case studies and compared the result with the smell indicator for precision, recall, and accuracy. Results:GPT-4 and the smell indicator perform similarly, with GPT-4 performing slightly better in some instances and the smell indicator (SI) in others. The advantage of GPT-4 is that it is based on the definition of case studies and provides traceability on how it reaches its conclusions. Conclusion:As GPT-4 performed well on the task and provides traceability, we should use and, with that, evaluate it on data extraction tasks, supporting us as authors.

Full Text