Abstract

ContextCross-project defect prediction (CPDP) which uses dataset from other projects to build predictors has been recently recommended as an effective approach for building prediction models that lack historical or sufficient local datasets. Class imbalance and distribution mismatch between the source and target datasets associated with real-world defect datasets are known to have a negative impact on prediction performance. ObjectiveTo alleviate the negative effects of class imbalance and distribution mismatch on performance of CPDP models by using Class Distribution Estimation and Synthetic Minority Oversampling Technique. A novel approach called Class Distribution Estimation with Synthetic Minority Oversampling Technique (CDE-SMOTE) is proposed to optimize and improve the CPDP performance and avoid excessive oversampling. MethodThe proposed CDE-SMOTE employs CDE to estimate the class distribution of the target project. SMOTE is then used to modify the class distribution of the training data until the distribution becomes the reverse of the approximated class distribution of the target project. Four comprehensive experiments are conducted on 14 open source software projects. ResultsThe proposed approach improves the overall performance of CPDP models when compared to the performance of other CPDP approaches. Significant improvements are observed in 63% of the test cases according to the Wilcoxon signed-rank tests with 16.421%, 29.687% and 20.259% improvements in terms of Balance, G-measure, and F-measure, respectively. Application of CDE-SMOTE on NN-filtered datasets significantly improved prediction performance. ConclusionsCDE-SMOTE mitigates the class imbalance and distribution mismatch problems and also helps prevents excessive oversampling that results in performance degradation of prediction models. This approach is thus recommended for CPDP studies in software engineering.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call