As the scale and complexity of software increase, software security issues have become the focus of society. Software defect prediction (SDP) is an important means to assist developers in discovering and repairing potential defects that may endanger software security in advance and improving software security and reliability. Currently, cross-project defect prediction (CPDP) and cross-company defect prediction (CCDP) are widely studied to improve the defect prediction performance, but there are still problems such as inconsistent metrics and large differences in data distribution between source and target projects. Therefore, a new CCDP method based on metric matching and sample weight setting is proposed in this study. First, a clustering-based metric matching method is proposed. The multigranularity metric feature vector is extracted to unify the metric dimension while maximally retaining the information contained in the metrics. Then use metric clustering to eliminate metric redundancy and extract representative metrics through principal component analysis (PCA) to support one-to-one metric matching. This strategy not only solves the metric inconsistent and redundancy problem but also transforms the cross-company heterogeneous defect prediction problem into a homogeneous problem. Second, a sample weight setting method is proposed to transform the source data distribution. Wherein the statistical source sample frequency information is set as an impact factor to increase the weight of source samples that are more similar to the target samples, which improves the data distribution similarity between the source and target projects, thereby building a more accurate prediction model. Finally, after the above two-step processing, some classical machine learning methods are applied to build the prediction model, and 12 project datasets in NASA and PROMISE are used for performance comparison. Experimental results prove that the proposed method has superior prediction performance over other mainstream CCDP methods.
Read full abstract