Improving software quality estimation by combining feature selection strategies with sampled ensemble learning

Taghi M Khoshgoftaar,Kehan Gao,Armi Napolitano

doi:10.1109/iri.2014.7051921

Abstract

The efficiency (prediction accuracy) of a classification model is affected by the quality of training data. High dimensionality and class imbalance are two main problems that may cause low quality of training datasets, making data preprocessing a very important step for a classification problem. Feature (software metric) selection and data sampling are frequently used to overcome these problems. Feature selection (FS) is a process of selecting the most important attributes from the original dataset. Data sampling copes with class imbalance by adding/removing instances to/from training datasets. Another interesting method, called boosting (building multiple models, with each model tuned to work better on instances misclassifled by previous models), is found also effective for addressing the class imbalance problem. In this study, we investigate two types of FS approaches: individual FS and repetitive sampled FS. Following feature selection, models are built either using a plain learner or using a boosting algorithm, where random undersampling integrates with the AdaBoost algorithm. We focus on studying the impact of two FS methods (individual FS vs. repetitive sampled FS) and two model-building processes (boosting vs. plain learner) on software quality prediction. Six feature ranking techniques are examined in the experiment. The results demonstrate that the repetitive sampled FS generally has better performance than the individual FS technique when a plain learner is used for the subsequent learning process, and that boosting is more effective in improving classification performance than not using boosting.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving software quality estimation by combining feature selection strategies with sampled ensemble learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

THE USE OF UNDER- AND OVERSAMPLING WITHIN ENSEMBLE FEATURE SELECTION AND CLASSIFICATION FOR SOFTWARE QUALITY PREDICTION
KEHAN GAO ... RANDALL WALD
International Journal of Reliability, Quality and Safety Engineering | VOL. 21
KEHAN GAO, et. al.KEHAN GAO ... RANDALL WALD
01 Feb 2014
International Journal of Reliability, Quality and Safety Engineering | VOL. 21

Comparing Feature Selection Techniques for Software Quality Estimation Using Data-Sampling-Based Boosting Algorithms
Taghi M Khoshgoftaar ... Kehan Gao
International Journal of Reliability, Quality and Safety Engineering | VOL. 22
Taghi M Khoshgoftaar, et. al.Taghi M Khoshgoftaar ... Kehan Gao
01 Jun 2015
International Journal of Reliability, Quality and Safety Engineering | VOL. 22

The Use of Ensemble-Based Data Preprocessing Techniques for Software Defect Prediction
Kehan Gao ... Taghi M Khoshgoftaar
International Journal of Software Engineering and Knowledge Engineering | VOL. 24
Kehan Gao, et. al.Kehan Gao ... Taghi M Khoshgoftaar
01 Nov 2014
International Journal of Software Engineering and Knowledge Engineering | VOL. 24

Improving Software Quality Estimation by Combining Boosting and Feature Selection
Kehan Gao ... Taghi Khoshgoftaar
-
Kehan Gao, et. al.Kehan Gao ... Taghi Khoshgoftaar
01 Dec 2013
01 Dec 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving software quality estimation by combining feature selection strategies with sampled ensemble learning

Abstract

Talk to us

Similar Papers