Abstract

As the scale and complexity of software increase, software security issues have become the focus of society. Software defect prediction (SDP) is an important means to assist developers in discovering and repairing potential defects that may endanger software security in advance and improving software security and reliability. Currently, cross-project defect prediction (CPDP) and cross-company defect prediction (CCDP) are widely studied to improve the defect prediction performance, but there are still problems such as inconsistent metrics and large differences in data distribution between source and target projects. Therefore, a new CCDP method based on metric matching and sample weight setting is proposed in this study. First, a clustering-based metric matching method is proposed. The multigranularity metric feature vector is extracted to unify the metric dimension while maximally retaining the information contained in the metrics. Then use metric clustering to eliminate metric redundancy and extract representative metrics through principal component analysis (PCA) to support one-to-one metric matching. This strategy not only solves the metric inconsistent and redundancy problem but also transforms the cross-company heterogeneous defect prediction problem into a homogeneous problem. Second, a sample weight setting method is proposed to transform the source data distribution. Wherein the statistical source sample frequency information is set as an impact factor to increase the weight of source samples that are more similar to the target samples, which improves the data distribution similarity between the source and target projects, thereby building a more accurate prediction model. Finally, after the above two-step processing, some classical machine learning methods are applied to build the prediction model, and 12 project datasets in NASA and PROMISE are used for performance comparison. Experimental results prove that the proposed method has superior prediction performance over other mainstream CCDP methods.

Highlights

  • With the increasing scale and complexity of software, software security, and quality issues are becoming more and more important

  • To address the above issues, this study proposes a cross-company defect prediction (CCDP) method based on metric matching and sample weight setting to improve the security and reliability of software. e innovations lie in:on the one hand, a clustering-based metric matching method is proposed to solve the metric inconsistent and redundancy problem. e multigranularity metric feature vector is extracted to unify the metric dimension between the source and target projects

  • Us, a CCDP method based on metric matching and sample weight setting is proposed in this study to address above issues. e method overview is shown in Figure 1. e specific process is as follows: (1) Clustering-based metric matching: since the metric meaning and number between the source and target projects are quite different in the CCDP, it is necessary to match metrics between projects to facilitate the defect prediction model construction. erefore, a clustering-based metric matching method is proposed here

Read more

Summary

Introduction

With the increasing scale and complexity of software, software security, and quality issues are becoming more and more important. To address the above issues, this study proposes a CCDP method based on metric matching and sample weight setting to improve the security and reliability of software. A sample selection-based weight setting method is applied to adjust the source data distribution to make it as consistent as possible with target project dataset. It uses sample selection frequency information as the impact factor to increase the weight of source samples that are more similar to the target samples, so as to improve the data distribution similarity between projects to further improve the prediction accuracy.

Related Work
CCDP Method Based on Metric Matching and Sample Weight Setting
Experimental Verification and Performance Analysis
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call