Abstract

Context: Cross-project defect prediction (CPDP) research has been popular and many CPDP methods were proposed. While these methods used cross-project data as is for their inputs, useless or noisy information in the cross-project data can cause the degradation of predictive and computation performance. Removing such information makes the cross-project data simple and it will affect the performance of CPDP methods. Objective: To identify and quantify the effects of the data simplification for CPDP methods. Method: We conducted experiments that compared the predictive performance between CPDP with and without the data simplification. We adopted a data simplification method based on an active learning method proposed for software effort estimation. The experiments adopted 44 versions of OSS projects, four prediction models, and two CPDP methods, namely, Burak-filter and cross-project selection. Results: The data simplification achieved significant improvement in predictive performance for the cross-project selection. It did not improve Burak-filter. Conclusion: The data simplification can be helpful for the cross-project selection in terms of predictive performance and size reduction of cross-project data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call