Abstract

The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.