Evaluating large language models for software testing

Yihao Li,Pan Liu,Haiyang Wang,Jie Chu,W Eric Wong

doi:10.1016/j.csi.2024.103942

Abstract

Large language models (LLMs) have demonstrated significant prowess in code analysis and natural language processing, making them highly valuable for software testing. This paper conducts a comprehensive evaluation of LLMs applied to software testing, with a particular emphasis on test case generation, error tracing, and bug localization across twelve open-source projects. The advantages and limitations, as well as recommendations associated with utilizing LLMs for these tasks, are delineated. Furthermore, we delve into the phenomenon of hallucination in LLMs, examining its impact on software testing processes and presenting solutions to mitigate its effects. The findings of this work contribute to a deeper understanding of integrating LLMs into software testing, providing insights that pave the way for enhanced effectiveness in the field.

Full Text