Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Abdullateef O Balogun,Shuib Basri,Victor E Adeyemo,Amos O Bajeh,Abdullahi A Imam,Saipunidzam Mahamad,Hammed A Mojeed,Said J Abdulkadir,Qasem Al-Tashi,Malek A Almomani

doi:10.3390/sym12071147

Abstract

Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in the respective scope of studies. It is hence critical to conduct an extensive empirical study to address these contradictions to guide researchers and buttress the scientific tenacity of experimental conclusions. In this study, we investigated the impact of 46 FS methods using Naïve Bayes and Decision Tree classifiers over 25 software defect datasets from 4 software repositories (NASA, PROMISE, ReLink, and AEEEM). The ensuing prediction models were evaluated based on accuracy and AUC values. Scott–KnottESD and the novel Double Scott–KnottESD rank statistical methods were used for statistical ranking of the studied FS methods. The experimental results showed that there is no one best FS method as their respective performances depends on the choice of classifiers, performance evaluation metrics, and dataset. However, we recommend the use of statistical-based, probability-based, and classifier-based filter feature ranking (FFR) methods, respectively, in SDP. For filter subset selection (FSS) methods, correlation-based feature selection (CFS) with metaheuristic search methods is recommended. For wrapper feature selection (WFS) methods, the IWSS-based WFS method is recommended as it outperforms the conventional SFS and LHS-based WFS methods.

Highlights

Software defect prediction (SDP) is an essential procedure in software engineering
High dimensionality is one of such primary issues that undermine the quality of a given dataset, which leads to poor predictive models
An extended benchmark study was conducted to investigate the impact of 46 Feature selection (FS) methods over 25 defect datasets from four major repositories on the predictive performance of software defect prediction (SDP) models

Summary

Introduction

Software defect prediction (SDP) is an essential procedure in software engineering. It involves the deployment of machine learning (ML) methods on software features or metrics derived from softwareSymmetry 2020, 12, 1147; doi:10.3390/sym12071147 www.mdpi.com/journal/symmetrySymmetry 2020, 12, 1147 systems repositories to predict the quality and reliability of a software system [1,2]. Software defect prediction (SDP) is an essential procedure in software engineering It involves the deployment of machine learning (ML) methods on software features or metrics derived from software. Software engineers are expected to develop high quality and reliable software systems with or within limited resources [5,6,7]. Modern software systems are fundamentally massive and convolute with multiple and inter-related modules or components. These software systems are often periodically updated and upgraded with new features or functionalities based on new system requirements or software users demands. Some existing studies pointed out that the poor predictive performances of SDP models are often caused by the high dimensionality of software features. The existence of irrelevant and redundant software metrics has negative effects on SDP model performance [12,13,14,15,16,17]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Jul 9, 2020
Citations: 46	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction.
Abdullateef O Balogun ... Victor E Adeyemo
Computational Intelligence and Neuroscience | VOL. 2021
Abdullateef O Balogun, et. al.Abdullateef O Balogun ... Victor E Adeyemo
01 Jan 2020
Computational Intelligence and Neuroscience | VOL. 2021

Search-Based Wrapper Feature Selection Methods in Software Defect Prediction: An Empirical Analysis
Abdullateef O Balogun ... Said A Jadid
-
Abdullateef O Balogun, et. al.Abdullateef O Balogun ... Said A Jadid
01 Jan 2020
01 Jan 2020

Impact of feature selection on classification via clustering techniques in software defect prediction
F.E Usman-Hamza ... A.O Bajeh
Journal of Computer Science and Its Application | VOL. 26
F.E Usman-Hamza, et. al.F.E Usman-Hamza ... A.O Bajeh
09 Feb 2020
Journal of Computer Science and Its Application | VOL. 26

Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach
Abdullateef Oluwagbemiga Balogun ... Ahmad Sobri Hashim
Applied Sciences | VOL. 9
Abdullateef Oluwagbemiga Balogun, et. al.Abdullateef Oluwagbemiga Balogun ... Ahmad Sobri Hashim
09 Jul 2019
Applied Sciences | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry