Abstract

Cross-project defect prediction (CPDP) methods can be used when the target project is a new project or lacks enough labeled program modules. In these new target projects, we can easily extract and then measure these modules with software measurement tools. However, labeling these program modules is time-consuming, error-prone and requires professional domain knowledge. Moreover, directly using labeled modules in the other projects (i.e., the source projects) can not achieve satisfactory performance due to the large data distribution difference in most cases. In this article, to our best knowledge, we are the first to propose a novel method ALTRA, which can utilize both active learning and TrAdaBoost to alleviate this issue. In particular, we firstly use Burak filter to select similar labeled modules from the source project after analyzing the unlabeled modules in the target project. Then we use active learning to choose representative unlabeled modules from the target project and ask experts to label the type (i.e., defective or non-defective) of these modules. Later, we use TrAdaBoost to determine the weights of labeled modules in the source project and the target project, and then construct the model via weighted support vector machine. After selecting a small number of modules (i.e., only 5% modules) in the target project, we terminate the method ALTRA and return the final constructed model. To show the effectiveness of our proposed method ALTRA, we choose 10 large-scale open-source projects from different application domains. In terms of both F1 and AUC performance indicators, we find ALTRA can perform significantly better than seven state-of-the-art CPDP baselines. Moreover, we also show that the usage of Burak filter, the uncertainty active learning strategy, the class imbalanced learning method and TrAdaBoost are competitive in our proposed method ALTRA.

Highlights

  • Software defect prediction (SDP) [18], [25], [46] can construct models by mining version control systems and bug tracking systems, and uses the constructed models to predict defective modules in advance

  • We use TrAdaBoost to determine the weights of labeled modules in the source project and the target project respectively

  • Final empirical results show: (1) Our proposed method ALTRA can perform significantly better than seven state-of-the-art cross-project defect prediction (CPDP) baselines by only considering additional 5% unlabeled modules in the target project in terms of both F1 and AUC performance indicators

Read more

Summary

INTRODUCTION

Software defect prediction (SDP) [18], [25], [46] can construct models by mining version control systems and bug tracking systems, and uses the constructed models to predict defective modules in advance. In our study, we want to use active learning to select a small number of representative modules in the target project and resort experts to label these chosen modules. This setting can help us to select valuable modules from the target project to construct high-quality models. To our best knowledge, we are the first to propose a novel CPDP method ALTRA via active learning and TrAdaBoost This method firstly uses Burak filter to keep relevant modules in the source project.

BACKGROUND
PERFORMANCE INDICATORS
RESULT
THREATS TO EXTERNAL VALIDITY
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call