Improving effort-aware defect prediction by directly learning to rank software modules

Guancheng Lin,Wenhua Hu,Lei Liu,Jacky Wai Keung,Jianwen Xiang,Jiqing Rao,Xiao Yu,Junwei Zhou

doi:10.1016/j.infsof.2023.107250

Abstract

Effort-Aware Defect Prediction (EADP) ranks software modules according to the defect density of software modules, which allows testers to find more bugs while reviewing a certain amount of Lines Of Code (LOC). Most existing methods regard the EADP task as a regression or classification problem. Optimizing the regression loss or classification accuracy might result in poor effort-aware performance. Therefore, we propose a method called EALTR to improve the EADP performance by directly maximizing the Proportion of the found Bugs ([email protected]%) value when inspecting the top 20% LOC. EALTR uses the linear regression model to build the EADP model, and then employs the composite differential evolution algorithm to generate a set of coefficient vectors for the linear regression model. Finally, EALTR selects the coefficient vector that achieves the highest [email protected]% value on the training dataset to construct the EADP model. To further reduce the Initial False Alarms (IFA) value of EALTR, we propose a re-ranking strategy in the prediction phase. Our experimental results on eleven project datasets with 41 releases show that EALTR can find 15.97%–54.47% more bugs than the baseline methods whose IFA values are less than 10 and the re-ranking strategy significantly reduces the IFA value by 21.24%. Our study verifies the effectiveness of directly optimizing the effort-aware metric (i.e., [email protected]%) to build the EADP model. EALTR is recommended as an effective EADP method, since it can help software testers find more bugs.

Full Text