On the Impact of Lower Recall and Precision in Defect Prediction for Guiding Search-based Software Testing

Anjana Perera,Marcel Böhme,Aldeida Aleti,Burak Turhan

doi:10.1145/3655022

Abstract

Defect predictors, static bug detectors, and humans inspecting the code can propose locations in the program that are more likely to be buggy before they are discovered through testing. Automated test generators such as search-based software testing (SBST) techniques can use this information to direct their search for test cases to likely buggy code, thus speeding up the process of detecting existing bugs in those locations. Often the predictions given by these tools or humans are imprecise, which can misguide the SBST technique and may deteriorate its performance. In this article, we study the impact of imprecision in defect prediction on the bug detection effectiveness of SBST. Our study finds that the recall of the defect predictor, i.e., the proportion of correctly identified buggy code, has a significant impact on bug detection effectiveness of SBST with a large effect size. More precisely, the SBST technique detects 7.5 fewer bugs on average (out of 420 bugs) for every 5% decrements of the recall. However, the effect of precision, a measure for false alarms, is not of meaningful practical significance, as indicated by a very small effect size. In the context of combining defect prediction and SBST, our recommendation is to increase the recall of defect predictors as a primary objective and precision as a secondary objective. In our experiments, we find that 75% precision is as good as 100% precision. To account for the imprecision of defect predictors, in particular low recall values, SBST techniques should be designed to search for test cases that also cover the predicted non-buggy parts of the program, while prioritising the parts that have been predicted as buggy.

Full Text