ALTRA: Cross-Project Software Defect Prediction via Active Learning and Tradaboost

Zhidan Yuan,Yanzhou Mu,Zhanqi Cui,Xiang Chen

doi:10.1109/access.2020.2972644

Abstract

Cross-project defect prediction (CPDP) methods can be used when the target project is a new project or lacks enough labeled program modules. In these new target projects, we can easily extract and then measure these modules with software measurement tools. However, labeling these program modules is time-consuming, error-prone and requires professional domain knowledge. Moreover, directly using labeled modules in the other projects (i.e., the source projects) can not achieve satisfactory performance due to the large data distribution difference in most cases. In this article, to our best knowledge, we are the first to propose a novel method ALTRA, which can utilize both active learning and TrAdaBoost to alleviate this issue. In particular, we firstly use Burak filter to select similar labeled modules from the source project after analyzing the unlabeled modules in the target project. Then we use active learning to choose representative unlabeled modules from the target project and ask experts to label the type (i.e., defective or non-defective) of these modules. Later, we use TrAdaBoost to determine the weights of labeled modules in the source project and the target project, and then construct the model via weighted support vector machine. After selecting a small number of modules (i.e., only 5% modules) in the target project, we terminate the method ALTRA and return the final constructed model. To show the effectiveness of our proposed method ALTRA, we choose 10 large-scale open-source projects from different application domains. In terms of both F1 and AUC performance indicators, we find ALTRA can perform significantly better than seven state-of-the-art CPDP baselines. Moreover, we also show that the usage of Burak filter, the uncertainty active learning strategy, the class imbalanced learning method and TrAdaBoost are competitive in our proposed method ALTRA.

Highlights

Software defect prediction (SDP) [18], [25], [46] can construct models by mining version control systems and bug tracking systems, and uses the constructed models to predict defective modules in advance
We use TrAdaBoost to determine the weights of labeled modules in the source project and the target project respectively
Final empirical results show: (1) Our proposed method ALTRA can perform significantly better than seven state-of-the-art cross-project defect prediction (CPDP) baselines by only considering additional 5% unlabeled modules in the target project in terms of both F1 and AUC performance indicators

Summary

INTRODUCTION

Software defect prediction (SDP) [18], [25], [46] can construct models by mining version control systems and bug tracking systems, and uses the constructed models to predict defective modules in advance. In our study, we want to use active learning to select a small number of representative modules in the target project and resort experts to label these chosen modules. This setting can help us to select valuable modules from the target project to construct high-quality models. To our best knowledge, we are the first to propose a novel CPDP method ALTRA via active learning and TrAdaBoost This method firstly uses Burak filter to keep relevant modules in the source project.

BACKGROUND

PERFORMANCE INDICATORS

RESULT

THREATS TO EXTERNAL VALIDITY

Findings

CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 74	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ALTRA: Cross-Project Software Defect Prediction via Active Learning and Tradaboost

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Correlation Metric Selection based Correlation Alignment for Cross-project Defect Prediction
Jingwen Niu ... Zhiqiang Li
-
Jingwen Niu, et. al.Jingwen Niu ... Zhiqiang Li
01 Dec 2021
01 Dec 2021

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction
Zhiqiang Li ... Xiao-Yuan Jing
Automated Software Engineering | VOL. 25
Zhiqiang Li, et. al.Zhiqiang Li ... Xiao-Yuan Jing
16 Aug 2017
Automated Software Engineering | VOL. 25

DeepCPDP: Deep Learning Based Cross-Project Defect Prediction
Deyu Chen ... Junfeng Xie
IEEE Access | VOL. 7
Deyu Chen, et. al.Deyu Chen ... Junfeng Xie
01 Jan 2019
IEEE Access | VOL. 7

Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification.
Ying Xing ... Xueyan Lin
Computational intelligence and neuroscience | VOL. 2022
Ying Xing, et. al.Ying Xing ... Xueyan Lin
18 Apr 2022
Computational intelligence and neuroscience | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ALTRA: Cross-Project Software Defect Prediction via Active Learning and Tradaboost

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access