Selection of best feature reduction method for module-based software defect prediction

Shiva Singh,Tanwir Uddin Haider

doi:10.1088/1742-6596/2273/1/012002

Abstract

In the Software development life cycle (SDLC) the prediction of software defects is one of the crucial parts. Recent years have witnessed various studies for predicting software defects, and most of them involve machine learning techniques. Before predicting the defects some pre-processing steps are required, such as feature selection, outlier removal, and feature scaling as it helps in improving accuracy and reducing the execution time(ET). The present investigation focuses on feature selection which is a dimensionality reduction technique. Further, we have also proposed a framework on module-based software defect prediction using feature selection techniques. These techniques are divided into three categories i.e filter methods, wrapper methods, and hybrid methods that combine two wrapper methods i.e. Sequential Forward Selection and Sequential Backward Selection that we have developed. Finally, classification has been performed by KNN, Logistic Regression, Decision Tree, and SVM using the above-mentioned feature selection techniques on eight publicly available Promise datasets and compared them with the existing state-of-the-art(SOTA) methods. The result shows that the hybrid method performs better in terms of accuracy by 4.2%, 3.9%, 3.8% on datasets pc4, jm1, kc2 respectively when applied along with machine learning, as compared to filter and wrapper methods.

Full Text