An Empirical Study of Software Metrics Diversity for Cross-Project Defect Prediction

Yiwen Zhong,Kun Song,Peng He,Shengkai Lv,Chunlai Chai

doi:10.1155/2021/3135702

Yiwen Zhong, Kun Song + Show 3 more

Open Access

https://doi.org/10.1155/2021/3135702

Copy DOI

Journal: Mathematical Problems in Engineering	Publication Date: Nov 28, 2021
Citations: 4	License type: CC BY 4.0

Affiliation: Hubei University

Abstract

Cross-project defect prediction (CPDP) is a mainstream method estimating the most defect-prone components of software with limited historical data. Several studies investigate how software metrics are used and how modeling techniques influence prediction performance. However, the software’s metrics diversity impact on the predictor remains unclear. Thus, this paper aims to assess the impact of various metric sets on CPDP and investigate the feasibility of CPDP with hybrid metrics. Based on four software metrics types, we investigate the impact of various metric sets on CPDP in terms of F-measure and statistical methods. Then, we validate the dominant performance of CPDP with hybrid metrics. Finally, we further verify the CPDP-OSS feasibility built with three types of metrics (orient-object, semantic, and structural metrics) and challenge them against two current models. The experimental results suggest that the impact of different metric sets on the performance of CPDP is significantly distinct, with semantic and structural metrics performing better. Additionally, trials indicate that it is helpful for CPDP to increase the software’s metrics diversity appropriately, as the CPDP-OSS improvement is up to 53.8%. Finally, compared with two baseline methods, TCA+ and TDSelector, the optimized CPDP model is viable in practice, and the improvement rate is up to 50.6% and 25.7%, respectively.

Highlights

In software engineering, the conventional defect prediction approach trains a predictor using historical data of the target project and uses it to predict defects in the subsequent version or release. is process is named as within-project defect prediction (WPDP)
To conduct an impact analysis among all four metric sets, the cross-project defect prediction (CPDP) experiments are conducted in the first scenario. is trial analyzes the differences of the software metric sets under a specific classifier. en, we expand the experiments in the following scenario and compare the average prediction results of six cases involving different combination patterns
RQ1: Our experimental results suggest that the impact of various metrics sets on the performance of CPDP is distinct in terms of F-measure

Summary

Introduction

The conventional defect prediction approach trains a predictor using historical data of the target project and uses it to predict defects in the subsequent version or release. is process is named as within-project defect prediction (WPDP). The conventional defect prediction approach trains a predictor using historical data of the target project and uses it to predict defects in the subsequent version or release. Is process is named as within-project defect prediction (WPDP). CPDP refers to predicting defects in a project using a predictor trained on historical data of other projects [1,2,3]. Various software metrics such as static code, process, object-oriented, and network metrics have been employed for defect prediction. Several studies have confirmed the discrepancy in the performance of WPDP with different metric sets [4, 5]. Radjenovic et al [4] highlight that object-oriented and process metrics perform better among six categories of software metrics

Objectives

Methods

Results

Conclusion