Stability of Three Forms of Feature Selection Methods on Software Engineering Data

Huanjing Wang,Taghi Khoshgoftaar,Amri Napolitano

doi:10.18293/seke2015-198

Abstract

One of the major challenges when working with software metrics datasets is that some metrics may be redundant or irrelevant to software defect prediction. This may be addressed using feature (metric) selection, which chooses an appropriate subset of features for use in downstream computation. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based subset evaluation, which uses statistical measures on feature subsets to find the best choice; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. Four software metric datasets from a real-world software project are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Stability of Three Forms of Feature Selection Methods on Software Engineering Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

On the Stability of Feature Selection Methods in Software Quality Prediction: An Empirical Investigation
Huanjing Wang ... Taghi M Khoshgoftaar
International Journal of Software Engineering and Knowledge Engineering | VOL. 25
Huanjing Wang, et. al.Huanjing Wang ... Taghi M Khoshgoftaar
01 Nov 2015
International Journal of Software Engineering and Knowledge Engineering | VOL. 25

Exploring the Stability of Feature Selection Methods across a Palette of Gene Expression Datasets
Zahra Mungloo-Dilmohamud ... Yasmina Jaufeerally-Fakim
-
Zahra Mungloo-Dilmohamud, et. al.Zahra Mungloo-Dilmohamud ... Yasmina Jaufeerally-Fakim
13 Nov 2019
13 Nov 2019

Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets
Zahra Mungloo-Dilmohamud ... Carlos Peña-Reyes
-
Zahra Mungloo-Dilmohamud, et. al.Zahra Mungloo-Dilmohamud ... Carlos Peña-Reyes
01 Jan 2020
01 Jan 2020

A novel dataset-similarity-aware approach for evaluating stability of software metric selection techniques
Huanjing Wang ... Amri Napolitano
-
Huanjing Wang, et. al.Huanjing Wang ... Amri Napolitano
01 Aug 2012
01 Aug 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Stability of Three Forms of Feature Selection Methods on Software Engineering Data

Abstract

Talk to us

Similar Papers