A Hybrid Data Preprocessing Technique based on Maximum Likelihood Logistic Regression with Filtering for Enhancing Software Defect Prediction

Kamal Bashir,Ahmed Saad Hussein,Mahama Yahaya,Tayseer Ali

doi:10.1109/iske47853.2019.9170328

Abstract

Software Defect Prediction (SDP) is critical to ensure product reliability and customer satisfaction. Many studies conducted to predict defective modules in the software development process applied historical software defect data mined from online depositories. However, data obtained from online depositories have problems associated with high dimensionality and noisy. High dimensionality comes when the data contains a huge number of independent attributes for developing models of classification. Noisy data refer to those located deep inside the region of different class other than their own. The adverse effect of these data problems for the prediction performance of most statistical learning methodologies is self-evident. To this end, we suggest a hybrid preprocessing approach in which iterative partitioning filtering is conducted prior to feature selection (FS) and the technique is validated using selected FS methods including Chisquare (CS), information gain (IG), gain ratio (GR), relief (RF) and symmetric uncertainty (SU) and maximum likelihood logistic regression (MLLR). It is important to note that the idea of MLLR for FS in the context of software defect prediction have not been fully investigated despite the potential usefulness. For the combined use of variable selection and noise filtering (NF), two cases are considered: (1) variable selection technique applied on original dataset and relevant variables selected from original data (2) variable selection approach applied on a cleaned data and variables selected from cleaned data. We apply this method on six software defect datasets in their clean and noisy form to select useful features for classification modeling of Software Defect Prediction (SDP). The Random Forest (RaF), K-Nearest Neighbors (KNN) and Multilayer Perceptron (MLP) classifiers are tested on the selected feature. The performance of the models captured using three indicators are compared. The results confirm that the MLLR can be useful in selecting the optimal feature subset for more accurate prediction of defective modules in the software development process.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Hybrid Data Preprocessing Technique based on Maximum Likelihood Logistic Regression with Filtering for Enhancing Software Defect Prediction

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Learning in Software Defect Prediction
Kamal Bashir ... Tianrui Li
The International Arab Journal of Information Technology | VOL. 17
Kamal Bashir, et. al.Kamal Bashir ... Tianrui Li
01 Sep 2020
The International Arab Journal of Information Technology | VOL. 17

Software Defect Prediction Using Wrapper Feature Selection Based on Dynamic Re-Ranking Strategy
Abdullateef Oluwagbemiga Balogun ... Luiz Fernando Capretz
Symmetry | VOL. 13
Abdullateef Oluwagbemiga Balogun, et. al.Abdullateef Oluwagbemiga Balogun ... Luiz Fernando Capretz
12 Nov 2021
Symmetry | VOL. 13

Software defect prediction based on nested-stacking and heterogeneous feature selection
Li-Qiong Chen ... Can Wang
Complex & Intelligent Systems | VOL. 8
Li-Qiong Chen, et. al.Li-Qiong Chen ... Can Wang
20 Feb 2022
Complex & Intelligent Systems | VOL. 8

Isolation Forest Wrapper Approach for Feature Selection in Software Defect Prediction
Zhiguo Ding
IOP Conference Series: Materials Science and Engineering | VOL. 1043
Zhiguo DingZhiguo Ding
01 Jan 2020
IOP Conference Series: Materials Science and Engineering | VOL. 1043

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Data Preprocessing Technique based on Maximum Likelihood Logistic Regression with Filtering for Enhancing Software Defect Prediction

Abstract

Talk to us

Similar Papers