Abstract

Software Defect Prediction (SDP) is critical to ensure product reliability and customer satisfaction. Many studies conducted to predict defective modules in the software development process applied historical software defect data mined from online depositories. However, data obtained from online depositories have problems associated with high dimensionality and noisy. High dimensionality comes when the data contains a huge number of independent attributes for developing models of classification. Noisy data refer to those located deep inside the region of different class other than their own. The adverse effect of these data problems for the prediction performance of most statistical learning methodologies is self-evident. To this end, we suggest a hybrid preprocessing approach in which iterative partitioning filtering is conducted prior to feature selection (FS) and the technique is validated using selected FS methods including Chisquare (CS), information gain (IG), gain ratio (GR), relief (RF) and symmetric uncertainty (SU) and maximum likelihood logistic regression (MLLR). It is important to note that the idea of MLLR for FS in the context of software defect prediction have not been fully investigated despite the potential usefulness. For the combined use of variable selection and noise filtering (NF), two cases are considered: (1) variable selection technique applied on original dataset and relevant variables selected from original data (2) variable selection approach applied on a cleaned data and variables selected from cleaned data. We apply this method on six software defect datasets in their clean and noisy form to select useful features for classification modeling of Software Defect Prediction (SDP). The Random Forest (RaF), K-Nearest Neighbors (KNN) and Multilayer Perceptron (MLP) classifiers are tested on the selected feature. The performance of the models captured using three indicators are compared. The results confirm that the MLLR can be useful in selecting the optimal feature subset for more accurate prediction of defective modules in the software development process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.