Abstract

Cross-project defect prediction (CPDP) aims to build a prediction model on existing source projects and predict the labels of target project. The data distribution difference between different projects makes CPDP very challenging. Besides, most existing CPDP methods usually require sufficient and labeled data. However, acquiring lots of labeled data for a new project is difficult while obtaining the unlabeled data is relatively easy. A desirable approach is building a prediction model on unlabeled data and labeled data. CPDP in this scenario is called cross-project semi-supervised defect prediction (CSDP). Recently, generative adversarial networks have achieved impressive results with these strong ability of learning data distribution and discriminative representation. For effectively learning the discriminative features of data from different projects, we propose a Discriminative Adversarial Feature Learning (DAFL) approach for CSDP. DAFL consists of feature transformer and project discriminator, which compete with each other. A feature transformer tries to generate feature representation, which learns the discriminant information and preserves intrinsic structure inferred from both labeled and unlabeled data. A project discriminator tries to discriminate source and target instances on the generated representation. Experiments on 16 projects show that DAFL performs significantly better than baselines.

Highlights

  • Software defect prediction (SDP) [1]–[8] is an important software quality assurance step of predicting the defectproneness in software project development history

  • When we do not have sufficient amount of historical data, cross-project defect prediction (CPDP) [23] is a satisfactory solution, which refers to building the prediction model trained by the data from source projects and predicting the label of a target project

  • In order to address the challenges of distribution difference between different projects and limited number of labeled data, we propose a new approach, termed Discriminative Adversarial Feature Learning (DAFL) for cross-project semi-supervised defect prediction (CSDP)

Read more

Summary

Introduction

Software defect prediction (SDP) [1]–[8] is an important software quality assurance step of predicting the defectproneness in software project development history. Many prior SDP studies predict the fault of a new instance within the same project, which is called within-project defect prediction (WPDP) [9]–[13]. The associate editor coordinating the review of this manuscript and approving it for publication was Zhaojun Li. studies have shown that a useful machine learning model needs to be trained by using sufficient and complete data. It is a challenging problem that a new project with limited historical data could perform the prediction model well. When we do not have sufficient amount of historical data, cross-project defect prediction (CPDP) [23] is a satisfactory solution, which refers to building the prediction model trained by the data from source projects and predicting the label of a target project

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.