Abstract

In the community of software defect prediction, a common and significant problem is the data imbalance, which is caused by the fact that the non-defect prone modules are much larger than the defect prone modules. This problem makes most of the typical classifiers, such as LR, SVM, Decision tree, Boosting, etc., prefer to the majority class, non-defect prone modules. In most cases, however, we are more interested in the minority class, defect prone modules, as we want to detect more defect prone modules. In order to improve the ability of identifying the minority class, we propose an adaptive weight updating scheme based on AdaBoost. We first, employ SMOTE or any other synthetic samples generation methods to balance the training datasets. Then, every synthetic sample is given a penalty factor adaptively according to sample's density. The penalty factor is introduced into the cost function to adjust samples' weights so that the base classifiers are guided adaptively to learn the reliable synthetic samples instead of noisy samples. Finally, a more reliable classifier is produced, and the accuracy of the minority class is increased. A series of experiments on MDP, a NASA software defect datasets, is performed, and the results demonstrate the effectiveness of our method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.