Dictionary learning based software defect prediction

Jin Liu,Shi Ying,Shan-Shan Wu,Zhi-Wu Zhang,Xiao-Yuan Jing

doi:10.1145/2568225.2568320

Abstract

In order to improve the quality of a software system, software defect prediction aims to automatically identify defective software modules for efficient software test. To predict software defect, those classification methods with static code attributes have attracted a great deal of attention. In recent years, machine learning techniques have been applied to defect prediction. Due to the fact that there exists the similarity among different software modules, one software module can be approximately represented by a small proportion of other modules. And the representation coefficients over the pre-defined dictionary, which consists of historical software module data, are generally sparse. In this paper, we propose to use the dictionary learning technique to predict software defect. By using the characteristics of the metrics mined from the open source software, we learn multiple dictionaries (including defective module and defective-free module sub-dictionaries and the total dictionary) and sparse representation coefficients. Moreover, we take the misclassification cost issue into account because the misclassification of defective modules generally incurs much higher risk cost than that of defective-free ones. We thus propose a cost-sensitive discriminative dictionary learning (CDDL) approach for software defect classification and prediction. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that CDDL outperforms several representative state-of-the-art defect prediction methods.

Full Text