An Empirical Investigation on Wrapper-Based Feature Selection for Predicting Software Quality

Huanjing Wang,Taghi M Khoshgoftaar,Amri Napolitano

doi:10.1142/s0218194015400057

Abstract

The basic measurements for software quality control and management are the various project and software metrics collected at various states of a software development life cycle. The software metrics may not all be relevant for predicting the fault proneness of software components, modules, or releases. Thus creating the need for the use of feature (software metric) selection. The goal of feature selection is to find a minimum subset of attributes that can characterize the underlying data with results as well as, or even better than the original data when all available features are considered. As an example of inter-disciplinary research (between data science and software engineering), this study is unique in presenting a large comparative study of wrapper-based feature (or attribute) selection techniques for building defect predictors. In this paper, we investigated thirty wrapper-based feature selection methods to remove irrelevant and redundant software metrics used for building defect predictors. In this study, these thirty wrappers vary based on the choice of search method (Best First or Greedy Stepwise), leaner (Naïve Bayes, Support Vector Machine, and Logistic Regression), and performance metric (Overall Accuracy, Area Under ROC (Receiver Operating Characteristic) Curve, Area Under the Precision-Recall Curve, Best Geometric Mean, and Best Arithmetic Mean) used in the defect prediction model evaluation process. The models are trained using the three learners and evaluated using the five performance metrics. The case study is based on software metrics and defect data collected from a real world software project. The results demonstrate that Best Arithmetic Mean is the best performance metric used within the wrapper. Naïve Bayes performed significantly better than Logistic Regression and Support Vector Machine as a wrapper learner on slightly and less imbalanced datasets. We also recommend Greedy Stepwise as a search method for wrappers. Moreover, comparing to models built with full datasets, the performances of defect prediction models can be improved when metric subsets are selected through a wrapper subset selector.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Empirical Investigation on Wrapper-Based Feature Selection for Predicting Software Quality

Abstract

Talk to us

Similar Papers

More From: International Journal of Software Engineering and Knowledge Engineering

Lead the way for us

Journal: International Journal of Software Engineering and Knowledge Engineering	Publication Date: Feb 1, 2015
Citations: 13

Similar Papers

A Study on Software Metric Selection for Software Fault Prediction
Huanjing Wang ... Taghi M Khoshgoftaar
-
Huanjing Wang, et. al.Huanjing Wang ... Taghi M Khoshgoftaar
01 Dec 2019
01 Dec 2019

Stability of filter- and wrapper-based software metric selection techniques
Huanjing Wang ... Taghi M Khoshgoftaar
-
Huanjing Wang, et. al.Huanjing Wang ... Taghi M Khoshgoftaar
01 Aug 2014
01 Aug 2014

An Empirical Study on Wrapper-Based Feature Selection for Software Engineering Data
Huanjing Wang ... Amri Napolitano
-
Huanjing Wang, et. al.Huanjing Wang ... Amri Napolitano
01 Dec 2013
01 Dec 2013

The importance of performance metrics within wrapper feature selection
Randall Wald ... Taghi Khoshgoftaar
-
Randall Wald, et. al.Randall Wald ... Taghi Khoshgoftaar
01 Aug 2013
01 Aug 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Empirical Investigation on Wrapper-Based Feature Selection for Predicting Software Quality

Abstract

Talk to us

Similar Papers

More From: International Journal of Software Engineering and Knowledge Engineering