METRIC SELECTION FOR SOFTWARE DEFECT PREDICTION

HUANJING WANG,TAGHI M. KHOSHGOFTAAR,KEHAN GAO,JASON VAN HULSE

doi:10.1142/s0218194011005256

Abstract

Real-world software systems are becoming larger, more complex, and much more unpredictable. Software systems face many risks in their life cycles. Software practitioners strive to improve software quality by constructing defect prediction models using metric (feature) selection techniques. Finding faulty components in a software system can lead to a more reliable final system and reduce development and maintenance costs. This paper presents an empirical study of six commonly used filter-based software metric rankers and our proposed ensemble technique using rank ordering of the features (mean or median), applied to three large software projects using five commonly used learners. The classification accuracy was evaluated in terms of the AUC (Area Under the ROC (Receiver Operating Characteristic) Curve) performance metric. Results demonstrate that the ensemble technique performed better overall than any individual ranker and also possessed better robustness. The empirical study also shows that variations among rankers, learners and software projects significantly impacted the classification outcomes, and that the ensemble method can smooth out performance.

Full Text