Abstract

The process of developing a good quality software requires rigorous testing of the software modules. The effort and resources required for testing can be reduced by early prediction of the defects present in various modules of the software. This paper aims to do a comparative research on different classification algorithms taking into consideration the data imbalance and high dimension of the defect datasets. Artificial Neural Network (ANN), Decision Trees, K-nearest neighbour, SVM and Ensemble Learning are some of the algorithms in machine learning that have been used for classifying the modules in software as defect-prone and not defect-prone. For datasets from PROMISE repository, multiple software metrics have been evaluated with feature selection (FS) techniques such as Recursive Feature Elimination (RFE) and correlation based FS combined with Synthetic Minority Oversampling Technique for imbalanced datasets. In this research work, Stacking Ensemble technique gave best results for all the datasets with defect prediction accuracy more than 0.9 among the algorithms used for this experiment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call