Abstract

Stability of feature selection algorithm refers to its robustness to the perturbations of the training set, parameter settings or initialization. A stable feature selection algorithm is crucial for identifying the relevant feature subset of meaningful and interpretable features which is extremely important in the task of knowledge discovery. Though there are many stability measures reported in the literature for evaluating the stability of feature selection, none of them follows all the requisite properties of a stability measure. Among them, the Kuncheva index and its modifications, are widely used in practical problems. In this work, the merits and limitations of the Kuncheva index and its existing modifications (Lustgarten, Wald, nPOG/nPOGR, Nogueira) are studied and analysed with respect to the requisite properties of stability measure. One more limitation of the most recent modified similarity measure, Nogueira’s measure, has been pointed out. Finally, corrections to Lustgarten’s measure have been proposed to define a new modified stability measure that satisfies the desired properties and overcomes the limitations of existing popular similarity based stability measures. The effectiveness of the newly modified Lustgarten’s measure has been evaluated with simple toy experiments.

Highlights

  • Feature selection is one of the most fundamental issues in developing efficient models for classification, prediction or regression in the area of pattern analysis, machine learning or data mining

  • Among the selected feature subsets from multiple runs of the algorithm, 20 different pairs of feature subsets are considered for stability measurement where each pair contains one feature subset that is a proper subset of the other feature subset

  • Stability of any feature selection algorithm refers to its robustness with respect to training set perturbations

Read more

Summary

Introduction

Feature selection is one of the most fundamental issues in developing efficient models for classification, prediction or regression in the area of pattern analysis, machine learning or data mining. It is possible that different training sample sets produce different feature subsets which may lead to the same classification concept due to a high level of redundancy in the initial feature set. Stability measures related to feature subset-based feature selection algorithms are studied. While there are various stability measures for feature subset selection algorithm, similarity based measures, especially Kuncheva’s consistency index [11] is quite popular and widely used. To overcome the main limitation of the Kuncheva index i.e., its inability to cope with feature subsets of different cardinalities, a few modified similarity measures related to the Kuncheva index are available in the literature. Corrections to Lustgarten’s measure have been proposed to define a new modified stability measure that satisfies the desired properties and overcomes the limitations of existing popular similarity based stability measures.

Stability Measures for Feature Selection Algorithms
Analysis of Kuncheva Index and Its Extensions
Kuncheva Index
Lustgarten’s Measure
Wald’s Measure
Average nPOG and Average nPOGR
Nogueira and Brown’s Measure
Desired Properties of Stability Measure
Experiments for Illustration of the Drawbacks
Proposed Correction of Lustgarten’s Measure
Proposed Correction Value for Different Conditions
Proposed Corrected Lustgarten’s Measure
Experiments for Illustration
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.