UEFS: An efficient and comprehensive ensemble-based feature selection methodology to select informative features.

Maqbool Ali,Syed Imran Ali,Jaehun Bang,Dohyeong Kim,Maqbool Hussain,Taeho Hur,Byeong Ho Kang,Sungyoung Lee

doi:10.1371/journal.pone.0202705

Maqbool Ali, Syed Imran Ali + Show 6 more

Open Access

https://doi.org/10.1371/journal.pone.0202705

Copy DOI

Abstract

Feature selection is considered to be one of the most critical methods for choosing appropriate features from a larger set of items. This task requires two basic steps: ranking and filtering. Of these, the former necessitates the ranking of all features, while the latter involves filtering out all irrelevant features based on some threshold value. In this regard, several feature selection methods with well-documented capabilities and limitations have already been proposed. Similarly, feature ranking is also nontrivial, as it requires the designation of an optimal cutoff value so as to properly select important features from a list of candidate features. However, the availability of a comprehensive feature ranking and a filtering approach, which alleviates the existing limitations and provides an efficient mechanism for achieving optimal results, is a major problem. Keeping in view these facts, we present an efficient and comprehensive univariate ensemble-based feature selection (uEFS) methodology to select informative features from an input dataset. For the uEFS methodology, we first propose a unified features scoring (UFS) algorithm to generate a final ranked list of features following a comprehensive evaluation of a feature set. For defining cutoff points to remove irrelevant features, we subsequently present a threshold value selection (TVS) algorithm to select a subset of features that are deemed important for the classifier construction. The uEFS methodology is evaluated using standard benchmark datasets. The extensive experimental results show that our proposed uEFS methodology provides competitive accuracy and achieved (1) on average around a 7% increase in f-measure, and (2) on average around a 5% increase in predictive accuracy as compared with state-of-the-art methods.

Highlights

In the domain of data mining and machine learning, one of the most critical problems is the task of feature selection (FS), which pertains to the complexity of the appropriate choosing of features from a larger set of such [1]
We introduce an efficient and comprehensive univariate ensemble-based feature selection (uEFS) methodology to select informative features from a given dataset
For the uEFS methodology, we first proposed an innovative unified features scoring (UFS) algorithm to generate a final-ranked list of features without the use of any learning algorithm, high computational cost, and any individual statistical biases of state-of-the-art featureranking methods

Summary

Introduction

In the domain of data mining and machine learning, one of the most critical problems is the task of feature selection (FS), which pertains to the complexity of the appropriate choosing of features from a larger set of such [1]. Designing an empirical method to specify a minimum threshold value for retaining important features and overcoming the aforementioned limitations is our second target Keeping in view these two facts, we have proposed an efficient and comprehensive FS methodology, called univariate ensemble-based FS (uEFS), which includes two innovative algorithms, unified features scoring (UFS) and threshold value selection (TVS) and which allows for us to select informative features from a given dataset. In order to accomplish this aim, this study was undertaken with the following objectives: (1) to design a comprehensive and flexible feature-ranking algorithm to compute the ranks without (a) using any learning algorithm; (b) high computational costs; and (c) any individual statistical biases of state-of-the-art, feature-ranking methods and (2) to identify an appropriate cutoff value for the threshold to select a subset of features irrespective of the characteristics of the dataset with reasonable predictive accuracy. The demonstration of a proof-of-concept for the aforementioned techniques, after performing extensive experimentation which achieved (1) on average a 7% increase in the fmeasure as compared with the baseline approach, and (2) on average a 5% increase in predictive accuracy as compared with state-of-the-art methods

Related works

Materials and methods

Ã ðRecall Ã PrecisionÞ ðRecall þ PrecisionÞ ð15Þ

Experimental setup

Eliminate the low-length terms whose length is less than or equal to 2

Proposed Methodology uEFS

Compute the sum of all the positional scores from all the lists

Compute the feature occurrence rate among the filter measures

Findings

Conclusions and future directions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Aug 28, 2018
Citations: 17	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

UEFS: An efficient and comprehensive ensemble-based feature selection methodology to select informative features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Maximally Informative Feature and Sensor Selection in Pattern Recognition Using Local and Global Independent Component Analysis
Tian Lan ... Deniz Erdogmus
The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology | VOL. 48
Tian Lan, et. al.Tian Lan ... Deniz Erdogmus
27 Mar 2007
The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology | VOL. 48

A three-stage unsupervised dimension reduction method for text clustering
Kusum Kumari Bharti ... P.K Singh
Journal of Computational Science | VOL. 5
Kusum Kumari Bharti, et. al.Kusum Kumari Bharti ... P.K Singh
04 Dec 2013
Journal of Computational Science | VOL. 5

An efficient feature selection method based on improved elephant herding optimization to classify high‐dimensional biomedical data
Harpreet Singh ... Birmohan Singh
Expert Systems | VOL. 39
Harpreet Singh, et. al.Harpreet Singh ... Birmohan Singh
16 May 2022
Expert Systems | VOL. 39

Informative Feature Clustering and Selection for Gene Expression Data
Yuqi Yang ... Pengshuai Yin
IEEE Access | VOL. 7
Yuqi Yang, et. al.Yuqi Yang ... Pengshuai Yin
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

UEFS: An efficient and comprehensive ensemble-based feature selection methodology to select informative features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE