Comparing Feature Selection Techniques for Software Quality Estimation Using Data-Sampling-Based Boosting Algorithms

Taghi M Khoshgoftaar,Ye Chen,Amri Napolitano,Kehan Gao

doi:10.1142/s0218539315500138

Abstract

Software defect prediction is a classification technique that utilizes software metrics and fault data collected during the software development process to identify fault-prone modules before the testing phase. It aims to optimize project resource allocation and eventually improve the quality of software products. However, two factors, high dimensionality and class imbalance, may cause low quality training data and subsequently degrade classification models. Feature (software metric) selection and data sampling are frequently used to overcome these problems. Feature selection (FS) is a process of choosing a subset of relevant features so that the quality of prediction models can be maintained or improved. Data sampling alters the dataset to change its balance level, therefore alleviating the problem of traditional classification models that are biased toward the overrepresented (majority) class. A recent study shows that another method, called boosting (building multiple models, with each model tuned to work better on instances misclassified by previous models), is also effective for addressing the class imbalance problem. In this paper, we present a technique that uses FS followed by a boosting algorithm in the context of software quality estimation. We investigate four FS approaches: individual FS, repetitive sampled FS, sampled ensemble FS, and repetitive sampled ensemble FS, and study the impact of the four approaches on the quality of the prediction models. Ten base feature ranking techniques are examined in the case study. We also employ the boosting algorithm to construct classification models with no FS and use the results as the baseline for further comparison. The empirical results demonstrate that (1) FS is important and necessary prior to the learning process; (2) the repetitive sampled FS method generally has similar performance to the individual FS technique; and (3) the ensemble filter (including sampled ensemble filter and repetitive sampled ensemble filter) performs better than or similarly to the average of the corresponding individual base rankers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparing Feature Selection Techniques for Software Quality Estimation Using Data-Sampling-Based Boosting Algorithms

Abstract

Talk to us

Similar Papers

More From: International Journal of Reliability, Quality and Safety Engineering

Lead the way for us

Journal: International Journal of Reliability, Quality and Safety Engineering	Publication Date: Jun 1, 2015
Citations: 3

Similar Papers

Improving software quality estimation by combining feature selection strategies with sampled ensemble learning
Taghi M Khoshgoftaar ... Armi Napolitano
-
Taghi M Khoshgoftaar, et. al.Taghi M Khoshgoftaar ... Armi Napolitano
01 Aug 2014
01 Aug 2014

The Use of Ensemble-Based Data Preprocessing Techniques for Software Defect Prediction
Kehan Gao ... Taghi M Khoshgoftaar
International Journal of Software Engineering and Knowledge Engineering | VOL. 24
Kehan Gao, et. al.Kehan Gao ... Taghi M Khoshgoftaar
01 Nov 2014
International Journal of Software Engineering and Knowledge Engineering | VOL. 24

Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction
Sonika Chandrakant Rathi ... Lov Kumar
Expert Systems with Applications | VOL. 223
Sonika Chandrakant Rathi, et. al.Sonika Chandrakant Rathi ... Lov Kumar
17 Mar 2023
Expert Systems with Applications | VOL. 223

Combining Feature Subset Selection and Data Sampling for Coping with Highly Imbalanced Software Data
Kehan Gao ... Taghi Khoshgoftaar
-
Kehan Gao, et. al.Kehan Gao ... Taghi Khoshgoftaar
01 Jul 2015
01 Jul 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing Feature Selection Techniques for Software Quality Estimation Using Data-Sampling-Based Boosting Algorithms

Abstract

Talk to us

Similar Papers

More From: International Journal of Reliability, Quality and Safety Engineering