Software fault prediction (SFP) aims to detect fault-prone software modules, which is beneficial for allocating software testing resources and improving software quality. Recently, ensemble learning(EL)-based SFP methods have attracted much attention. Although many EL algorithms have been applied to SFP, they are still insufficient to generate multiple accurate and diverse base learners. Therefore, this paper presents a multi-modal EL algorithm (called NRSEL) based on neighborhood rough sets. In NRSEL, the technique of neighborhood approximate reduct (NAR) is used to implement the perturbation of attribute space and the bootstrap sampling technique is used to implement the perturbation of sample space. As a novel technique for the perturbation of attribute space, NAR stems from the concept of approximate reduct in rough sets. We also consider the application of NRSEL to SFP, and employ a hybrid scheme (called SMOTE-NRSEL) to handle the problem of imbalanced data in SFP. We compare SMOTE-NRSEL with existing EL algorithms using 20 public datasets. Experimental results indicate that SMOTE-NRSEL is effective for SFP. Compared with the baseline algorithms, on average, SMOTE-NRSEL improves the AUC, F1-score, and MCC by 3.09%, 3.18%, and 7.5%, respectively. Moreover, the results of three statistical tests (including the paired t-test, Friedman test, and Nemenyi test) indicate that SMOTE-NRSEL is significantly better than the baseline algorithms in most cases. This paper shows that NAR is a good choice for the perturbation of attribute space. With the help of NAR and the multi-modal perturbation strategy based on it, SMOTE-NRSEL can generate accurate and diverse base learners. The code is available at https://github.com/jiangfeng0278/NRSEL.
Read full abstract