Abstract

Software defect prediction (SDP), which classifies software modules into defect-prone and not-defect-prone categories, provides an effective way to maintain high quality software systems. Most existing SDP models attempt to attain lower classification error rates other than lower misclassification costs. However, in many real-world applications, misclassifying defect-prone modules as not-defect-prone ones usually leads to higher costs than misclassifying not-defect-prone modules as defect-prone ones. In this paper, we first propose a new two-stage cost-sensitive learning (TSCS) method for SDP, by utilizing cost information not only in the classification stage but also in the feature selection stage. Then, specifically for the feature selection stage, we develop three novel cost-sensitive feature selection algorithms, namely, Cost-Sensitive Variance Score (CSVS), Cost-Sensitive Laplacian Score (CSLS), and Cost-Sensitive Constraint Score (CSCS), by incorporating cost information into traditional feature selection algorithms. The proposed methods are evaluated on seven real data sets from NASA projects. Experimental results suggest that our TSCS method achieves better performance in software defect prediction compared to existing single-stage cost-sensitive classifiers. Also, our experiments show that the proposed cost-sensitive feature selection methods outperform traditional cost-blind feature selection methods, validating the efficacy of using cost information in the feature selection stage.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call