GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets

Mattia Chiesa,Gualtiero I Colombo,Giada Maioli,Luca Piacentini

doi:10.1186/s12859-020-3400-6

Mattia Chiesa, Gualtiero I Colombo + Show 2 more

Open Access

https://doi.org/10.1186/s12859-020-3400-6

Copy DOI

Abstract

BackgroundFeature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets.ResultsHere, we propose an innovative implementation of a genetic algorithm, called GARS, for fast and accurate identification of informative features in multi-class and high-dimensional datasets. In all simulations, GARS outperformed two standard filter-based and two ‘wrapper’ and one embedded’ selection methods, showing high classification accuracies in a reasonable computational time.ConclusionsGARS proved to be a suitable tool for performing feature selection on high-dimensional data. Therefore, GARS could be adopted when standard feature selection approaches do not provide satisfactory results or when there is a huge amount of data to be analyzed.

Highlights

Feature selection is a crucial step in machine learning analysis
We found that the selected features by GARS were robust, as the error rate on the validation test sets was consistently low for GARS and obtained with the lower number of features selected compared to the other methods
While we do not presume to have covered here the full range of options for performing feature selection on high-dimensional data, we believe that our test suggests GARS as a powerful and convenient resource for timely performance of an effective and robust collection of informative features in high-dimensions

Summary

Introduction

Feature selection is a crucial step in machine learning analysis. Currently, many feature selection approaches do not ensure satisfying results, in terms of accuracy and computational time, when the amount of data is huge, such as in ‘Omics’ datasets. The feature selection (FS) step seeks to pinpoint the most informative variables from data to build robust classification models. This becomes crucial in the Omics data era, as the combination of highdimensional data with information from various sources (clinical and environmental) enables researchers to study complex diseases such as cancer or cardiovascular disease in depth [1,2,3,4]. Chiesa et al BMC Bioinformatics (2020) 21:54 optimize a problem by improving iteratively the solution based on a given heuristic function, whereas hybrid methods are a sequential combination of different FS approaches, for example those based on filter and wrapper methods [9]. To find the optimal solution this scheme is repeated several times until the population has converged, i.e., new offspring are not significantly different from the previous generation

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 11, 2020
Citations: 33	License type: open-access

R Discovery Prime

R Discovery Prime

GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Comparison of feature selection methods for mapping soil organic matter in subtropical restored forests
Yang Chen ... Kaiyue Feng
Ecological Indicators | VOL. 135
Yang Chen, et. al.Yang Chen ... Kaiyue Feng
01 Feb 2022
Ecological Indicators | VOL. 135

A Meta-Analysis Survey on the Usage of Meta-Heuristic Algorithms for Feature Selection on High-Dimensional Datasets
Li Yu Yab ... Noorhaniza Wahid
IEEE Access | VOL. 10
Li Yu Yab, et. al.Li Yu Yab ... Noorhaniza Wahid
01 Jan 2021
IEEE Access | VOL. 10

A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection
Ayça Çakmak Pehlivanlı
Journal of Applied Statistics | VOL. 43
Ayça Çakmak PehlivanlıAyça Çakmak Pehlivanlı
12 Oct 2015
Journal of Applied Statistics | VOL. 43

Rough-FS
Rashmi Rekha Sahoo ... Debahuti Mishra
-
Rashmi Rekha Sahoo, et. al.Rashmi Rekha Sahoo ... Debahuti Mishra
03 Sep 2012
03 Sep 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics