Abstract
Software fault prediction (SFP) aims to detect fault-prone software modules, which is beneficial for allocating software testing resources and improving software quality. Recently, ensemble learning(EL)-based SFP methods have attracted much attention. Although many EL algorithms have been applied to SFP, they are still insufficient to generate multiple accurate and diverse base learners. Therefore, this paper presents a multi-modal EL algorithm (called NRSEL) based on neighborhood rough sets. In NRSEL, the technique of neighborhood approximate reduct (NAR) is used to implement the perturbation of attribute space and the bootstrap sampling technique is used to implement the perturbation of sample space. As a novel technique for the perturbation of attribute space, NAR stems from the concept of approximate reduct in rough sets. We also consider the application of NRSEL to SFP, and employ a hybrid scheme (called SMOTE-NRSEL) to handle the problem of imbalanced data in SFP. We compare SMOTE-NRSEL with existing EL algorithms using 20 public datasets. Experimental results indicate that SMOTE-NRSEL is effective for SFP. Compared with the baseline algorithms, on average, SMOTE-NRSEL improves the AUC, F1-score, and MCC by 3.09%, 3.18%, and 7.5%, respectively. Moreover, the results of three statistical tests (including the paired t-test, Friedman test, and Nemenyi test) indicate that SMOTE-NRSEL is significantly better than the baseline algorithms in most cases. This paper shows that NAR is a good choice for the perturbation of attribute space. With the help of NAR and the multi-modal perturbation strategy based on it, SMOTE-NRSEL can generate accurate and diverse base learners. The code is available at https://github.com/jiangfeng0278/NRSEL.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.