Cross-Project Defect Prediction Method based on Feature Distribution Alignment and Neighborhood Instance Selection

Yi Zhu Yi Zhu,Qiao Yu Yu Zhao,Xiaoying Chen Qiao Yu,Yu Zhao Yi Zhu

doi:10.53106/160792642022072304011

Yi Zhu Yi Zhu, Qiao Yu Yu Zhao + Show 2 more

Open Access

PDF Available

https://doi.org/10.53106/160792642022072304011

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

<p>In the practice of software project development, the developed project is a brand-new project. Defect prediction for this type of software project requires the use of other similar projects (i.e. source projects) to collect relevant data to build a defect prediction model, and make defect prediction for the project under development (i.e. target project). However, the prediction model built with the relevant data of the source project cannot achieve the ideal prediction performance when predicting the target project. The main reason is that there is a large data distribution difference between the source project and the target project. The data distribution difference is mainly in the distribution of features between projects and differences between instances. In response to the above problems, starting from both features and instances, a cross-project defect prediction method is proposed. This method first aligns the feature distribution based on the data of the existing target project and the source project data. Then, it selects the labeled instance that is similar to the unlabeled instance in the target project, and finally builds a defect prediction model based on the selected source project instances. Cross-project defect prediction experiments were carried out on the Relink datasets and the Promise datasets. Compared with the classic instance-based cross-project defect prediction method, significant improvements have been made in F-measure and AUC; compared with the prediction of within project defect prediction, it has achieved comparable performance.</p> <p>&nbsp;</p>

Full Text