Abstract

Feature selection is considered to be one of the most critical methods for choosing appropriate features from a larger set of items. This task requires two basic steps: ranking and filtering. Of these, the former necessitates the ranking of all features, while the latter involves filtering out all irrelevant features based on some threshold value. In this regard, several feature selection methods with well-documented capabilities and limitations have already been proposed. Similarly, feature ranking is also nontrivial, as it requires the designation of an optimal cutoff value so as to properly select important features from a list of candidate features. However, the availability of a comprehensive feature ranking and a filtering approach, which alleviates the existing limitations and provides an efficient mechanism for achieving optimal results, is a major problem. Keeping in view these facts, we present an efficient and comprehensive univariate ensemble-based feature selection (uEFS) methodology to select informative features from an input dataset. For the uEFS methodology, we first propose a unified features scoring (UFS) algorithm to generate a final ranked list of features following a comprehensive evaluation of a feature set. For defining cutoff points to remove irrelevant features, we subsequently present a threshold value selection (TVS) algorithm to select a subset of features that are deemed important for the classifier construction. The uEFS methodology is evaluated using standard benchmark datasets. The extensive experimental results show that our proposed uEFS methodology provides competitive accuracy and achieved (1) on average around a 7% increase in f-measure, and (2) on average around a 5% increase in predictive accuracy as compared with state-of-the-art methods.

Highlights

  • In the domain of data mining and machine learning, one of the most critical problems is the task of feature selection (FS), which pertains to the complexity of the appropriate choosing of features from a larger set of such [1]

  • We introduce an efficient and comprehensive univariate ensemble-based feature selection (uEFS) methodology to select informative features from a given dataset

  • For the uEFS methodology, we first proposed an innovative unified features scoring (UFS) algorithm to generate a final-ranked list of features without the use of any learning algorithm, high computational cost, and any individual statistical biases of state-of-the-art featureranking methods

Read more

Summary

Introduction

In the domain of data mining and machine learning, one of the most critical problems is the task of feature selection (FS), which pertains to the complexity of the appropriate choosing of features from a larger set of such [1]. Designing an empirical method to specify a minimum threshold value for retaining important features and overcoming the aforementioned limitations is our second target Keeping in view these two facts, we have proposed an efficient and comprehensive FS methodology, called univariate ensemble-based FS (uEFS), which includes two innovative algorithms, unified features scoring (UFS) and threshold value selection (TVS) and which allows for us to select informative features from a given dataset. In order to accomplish this aim, this study was undertaken with the following objectives: (1) to design a comprehensive and flexible feature-ranking algorithm to compute the ranks without (a) using any learning algorithm; (b) high computational costs; and (c) any individual statistical biases of state-of-the-art, feature-ranking methods and (2) to identify an appropriate cutoff value for the threshold to select a subset of features irrespective of the characteristics of the dataset with reasonable predictive accuracy. The demonstration of a proof-of-concept for the aforementioned techniques, after performing extensive experimentation which achieved (1) on average a 7% increase in the fmeasure as compared with the baseline approach, and (2) on average a 5% increase in predictive accuracy as compared with state-of-the-art methods

Related works
Materials and methods
à ðRecall à PrecisionÞ ðRecall þ PrecisionÞ ð15Þ
Experimental setup
Eliminate the low-length terms whose length is less than or equal to 2
Proposed Methodology uEFS
Compute the sum of all the positional scores from all the lists
Compute the feature occurrence rate among the filter measures
Findings
Conclusions and future directions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.