Abstract

The class overlap problem refers to instances from different categories heavily overlapping in the feature space. This issue is one of the challenges in improving the performance of software defect prediction (SDP). Currently, the studies on the impact of class overlap on SDP mainly focused on within-project defect prediction and cross-project defect prediction. Moreover, the existing class overlap instances cleaning methods are not suitable for cross-version defect prediction. In this paper, we propose a class overlap instances cleaning method based on the Ratio of K-nearest neighbors with the Same Label (RKSL). This method removes instances with the abnormal neighbor ratio in the training set. Based on the RKSL method, we investigate the impact of class overlap on the performance and interpretability of the cross-version defect prediction model. The experiment results show that class overlap can affect the performance of cross-version defect prediction models significantly. The RKSL method can handle the class overlap problem in defect datasets, but it may impact the interpretability of models. Through the analysis of feature changes, we consider that class overlap instances cleaning can assist models in identifying more important features.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.