Local versus Global Models for Just-In-Time Software Defect Prediction

Xingguang Yang,Kai Shi,Huiqun Yu,Liqiong Chen,Guisheng Fan

doi:10.1155/2019/2384706

Abstract

Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it can help improve the performance of prediction models. However, previous studies have focused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To this end, we compare the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes. The experiment considers three evaluation scenarios of cross-validation, cross-project-validation, and timewise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several homogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification models and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global models in the classification performance. However, local models have significantly better effort-aware prediction performance than global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set to 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort-aware JIT-SDP.

Highlights

IntroductionSoftware plays an important role in people’s daily life. the defective software can bring great economic losses to users and enterprises
Today, software plays an important role in people’s daily life
To describe the effectiveness of local models on different data sets, we calculate the number of times that local models perform significantly better than global models on 9 k values and two evaluation indicators (i.e., ACC and Popt) for each data set. e statistical results are shown in Table 5, where the second row is the number of times that local models perform better than global models for each project

Summary

Introduction

Software plays an important role in people’s daily life. the defective software can bring great economic losses to users and enterprises. Software defect prediction technology plays an important role in software quality assurance, and it is an active research topic in the field of software engineering data mining [2, 3]. In 1971, Akiyama et al [16] proposed a formula for the relationship between the number of software defects (D) and lines of code (LOC) in a project: D 4.86 + 0.018 × LOC. McCabe et al [17] believed that code complexity is better correlated with software defects than the number of code lines and proposed a cyclomatic complexity metric to measure code complexity. Nagappan and Ball [19] proposed relative code churn metrics to predict defect density at file level by measuring code churn during software development

Methods

Results

Conclusion