Abstract
In the community of software defect prediction, a common and significant problem is the data imbalance, which is caused by the fact that the non-defect prone modules are much larger than the defect prone modules. This problem makes most of the typical classifiers, such as LR, SVM, Decision tree, Boosting, etc., prefer to the majority class, non-defect prone modules. In most cases, however, we are more interested in the minority class, defect prone modules, as we want to detect more defect prone modules. In order to improve the ability of identifying the minority class, we propose an adaptive weight updating scheme based on AdaBoost. We first, employ SMOTE or any other synthetic samples generation methods to balance the training datasets. Then, every synthetic sample is given a penalty factor adaptively according to sample's density. The penalty factor is introduced into the cost function to adjust samples' weights so that the base classifiers are guided adaptively to learn the reliable synthetic samples instead of noisy samples. Finally, a more reliable classifier is produced, and the accuracy of the minority class is increased. A series of experiments on MDP, a NASA software defect datasets, is performed, and the results demonstrate the effectiveness of our method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.