Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques

Kalai Magal.R,Shomona Gracia Jacob

doi:10.5120/20693-3582

Abstract

In this paper, we apply classification algorithms on publicly available datasets of the NASA PROMISE repository in order to classify the software modules as defective/non-defective. The datasets employed for this research were PC1 , PC2, PC3 and PC4 [3]. This paper proposes a computational framework using data mining techniques to detect the existence of defects in software components. The framework comprises of feature selection, data classification and classifier evaluation. Correlation based feature subset selection, a feature-subset selection technique [4], is used to determine the significant features that are prominently affecting the defect prediction in software modules. The efficiency of predictive model could be enhanced with reduced feature set obtained after feature selection and further used to identify defective modules in a given set of inputs. This paper evaluates the performance of the proposed model. The experimental results indicate the effectiveness of the proposed feature selection based predictive model based on standard performance evaluation parameters

Full Text