Abstract

Data imbalancing is becoming a common problem to tackle in different fields like, defect prediction, change prediction, oil spills, medical diagnose etc. Various methods have been developed to handle imbalanced datasets in order to improve accuracy of the prediction models. Many studies have been carried out in the field of defect prediction for imbalanced datasets but most of them uses SMOTE oversampling method to handle the imbalanced data problem. There are many other oversampling methods which help to deal with imbalancing problem and are still unexplored particularly in the field of software defect prediction. This study develops a tool by implementing three of those unexplored oversampling methods namely ADASYN, SPIDER and Safe-Level-SMOTE. Furthermore, we analyze their performance in comparison to traditional method SMOTE. The performance of oversampling methods is evaluated by applying three machine learning techniques for defect prediction using object oriented metrics. The results are evaluated using two open source defect datasets. The result analysis showed that the prediction error decreased and performance of the machine learning techniques improved when balanced datasets were used with respect to three oversampling methods. Further, all of the three methods outperformed SMOTE while SPIDER oversampling method performed best in majority of the cases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.