Abstract

Cross-project defect prediction (CPDP) is a promising approach to help to allocate testing efforts efficiently and guarantee software reliability in the early software lifecycle. A CPDP method usually trains a software defect classifier based on labeled data sets. Then the trained classifier can predict new projects without labeled data. Most previous CPDP techniques focused on manually designing handcrafted features. However, these handcrafted features ignore the programs’ semantic information. Moreover, some other existing defect prediction approaches learned semantic features from source code to build classifiers directly. However, they did not consider the distribution divergence between source and target projects. To address these limitations, we put forward a new method called Adversarial Discriminative Convolutional Neural Network (ADCNN). It can extract the transferable semantic features from source code for CPDP tasks. Specifically, we first parse source files into token vectors and then map them to integer vectors via word embedding. Second, we combine adversarial learning with discriminative feature learning to train the ADCNN model. The key of the ADCNN model is to learn the discriminative mapping of the target project to the source feature space by deceiving a domain discriminator. A domain discriminator tries to distinguish the target project files from the source project files. Finally, we use the extracted transferable semantic features to build a classifier for CPDP tasks. We evaluate our method on ten benchmark projects in terms of F-measure, AUC, and PofB20 (an effort-aware evaluation metric). The experimental results demonstrate that our ADCNN method performs better compared with other related CPDP methods.

Highlights

  • With the increase of software scale and complexity, software reliability assurance becomes more difficult and vital

  • We propose a new model called Adversarial Discriminative Convolutional Neural Network (ADCNN)

  • EVALUATION METRICS Regarding the selection of evaluation metrics, we considered the two following aspects: (1) non-effort-aware scenario and (2) effort-aware scenario

Read more

Summary

Introduction

With the increase of software scale and complexity, software reliability assurance becomes more difficult and vital. Software testing is an essential means of reliability assurance. It is impractical for testers to test all code units. Software defect prediction (SDP) could help to find the defect-prone modules or files by analyzing the characteristics of static code. With the help of SDP, the software testing team could allocate resources more efficiently [25]. SDP usually uses machine learning to train prediction models [5], [21] based on historical data (e.g., source code edit logs [27]).

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call