Abstract

The purpose of cross-project defect prediction is to predict whether there are defects in this project module by using a prediction model trained by the data of other projects. For the divergence of the data distribution between different projects, the performance of cross-project defect prediction is not as good as within-project defect prediction. To reduce the difference as much as possible, researchers have proposed a variety of methods to filter training data from the perspective of transfer learning. In this paper, we introduce a “project-instance-metric" hierarchical filtering strategy to select training data for the defect prediction model. Using the three-level filtering method, the candidate projects that are most similar to the target project, the instances that are most similar to the target instance, and the metrics with the highest correlation to the prediction result are filtered out respectively. We compared three-level filtering with project-level filtering, instance-level filtering, and the combination of project-level and instance-level filtering methods in four classification algorithms using NASA open source data sets. Our experiments show that the three-level filtering method achieves more significant f-measure and AUC values than the single level training data filtering method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.