Defect prediction by using cluster ensembles

Yanhong Yang,Hongbing Qian,Jun Yang

doi:10.1109/icaci.2018.8377533

Abstract

Software defect prediction becomes an active research topic in recent years and receives lots of attention. Many researches focus on within project defect prediction, which needs historical data of the project. However, in practice, there exists insufficient training data available for a new project. So cross project defect prediction (CPDP) as well as unsupervised learning defect prediction were proposed to address this problem. Generally, CPDP models use training data from other projects and predict defect proneness for modules in a particular project of interest. However, due to the different data distribution between different projects, the performance of CPDP is highly volatile. To find a better way to solve the problem on unlabeled datasets, this paper focus on unsupervised learning, and proposed a new approach, Cluster Ensembles and Labeling (CEL), to predict defect proneness for unlabeled datasets. The experiment results on 15 open source projects show that CEL model show comparable predictive power compared to supervised learning models.

Full Text