Towards Balanced Defect Prediction with Better Information Propagation

Xianda Zheng,Huan Gao,Yuncheng Hua,Yuan-Fang Li,Guilin Qi

doi:10.1609/aaai.v35i1.16157

Abstract

Defect prediction, the task of predicting the presence of defects in source code artifacts, has broad application in software development. Defect prediction faces two major challenges, label scarcity, where only a small percentage of code artifacts are labeled, and data imbalance, where the majority of labeled artifacts are non-defective. Moreover, current defect prediction methods ignore the impact of information propagation among code artifacts and this negligence leads to performance degradation. In this paper, we propose DPCAG, a novel model to address the above three issues. We treat code artifacts as nodes in a graph, and learn to propagate influence among neighboring nodes iteratively in an EM framework. DPCAG dynamically adjusts the contributions of each node and selects high-confidence nodes for data augmentation. Experimental results on real-world benchmark datasets show that DPCAG improves performance compare to the state-of-the-art models. In particular, DPCAG achieves substantial performance superiority when measured by Matthews Correlation Coefficient (MCC), a metric that is widely acknowledged to be the most suitable for imbalanced data.

Full Text