A robust and accurate method for feature selection and prioritization from multi-class OMICs data.

Vittorio Fortino,Harri Alenius,Nanna Fyhrquist,Pia Kinaret,Dario Greco

doi:10.1371/journal.pone.0107801

Vittorio Fortino, Harri Alenius + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0107801

Copy DOI

Journal: PLoS ONE	Publication Date: Sep 23, 2014
Citations: 57	License type: CC BY 4.0

Affiliation: Finnish Institute of Occupational Health

Abstract

Selecting relevant features is a common task in most OMICs data analysis, where the aim is to identify a small set of key features to be used as biomarkers. To this end, two alternative but equally valid methods are mainly available, namely the univariate (filter) or the multivariate (wrapper) approach. The stability of the selected lists of features is an often neglected but very important requirement. If the same features are selected in multiple independent iterations, they more likely are reliable biomarkers. In this study, we developed and evaluated the performance of a novel method for feature selection and prioritization, aiming at generating robust and stable sets of features with high predictive power. The proposed method uses the fuzzy logic for a first unbiased feature selection and a Random Forest built from conditional inference trees to prioritize the candidate discriminant features. Analyzing several multi-class gene expression microarray data sets, we demonstrate that our technique provides equal or better classification performance and a greater stability as compared to other Random Forest-based feature selection methods.

Highlights

Identifying discriminant features, for instance from transcriptomics experiments, and modelling classifiers based on them are fundamental tasks when the aim is to highlight biomarkers
Multivariate techniques assess the relevance of groups of features simultaneously, by using selection methods coupled with machine learning techniques such as logistic regression, support vector machines (SVM) or random forests (RF) [7,8,9]
We compared the classification performance of the classifiers fuzzy pattern – random forest (FPRF).2–250, with those obtained using the features selected from varSelRF and Boruta

Summary

Introduction

Identifying discriminant features, for instance from transcriptomics experiments, and modelling classifiers based on them are fundamental tasks when the aim is to highlight biomarkers (e.g. genes or transcripts discriminating healthy from diseased samples). Clinical classification based on high throughput molecular profiling has been already explored for a number of complex diseases, such as cancer [1,2]. These studies become crucial in terms of public health when such approaches are considered for clinical practice [3]. Multivariate methods tend to identify different subsets of candidate biomarkers with equal accuracy, even when feature selection algorithms are used on the same data [5,6] This is true for feature selection problems in OMICs data analysis, where the number of investigated features is much larger than the number of samples. Multiple stability issues can affect these data sets, and the data sets can contain large number of redundant features [10]

Objectives

Methods

Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A robust and accurate method for feature selection and prioritization from multi-class OMICs data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

An improved electromagnetism-like mechanism algorithm and its application to the prediction of diabetes mellitus
Kung-Jeng Wang ... Kung-Min Wang
Journal of Biomedical Informatics | VOL. 54
Kung-Jeng Wang, et. al.Kung-Jeng Wang ... Kung-Min Wang
10 Feb 2015
Journal of Biomedical Informatics | VOL. 54

Strategy for genotoxic impurities: Hazard identification, risk assessment and management in Angelini
C Landolfi ... L Durando
Toxicology Letters | VOL. 196
C Landolfi, et. al.C Landolfi ... L Durando
07 May 2010
Toxicology Letters | VOL. 196

An improved artificial immune recognition system with the opposite sign test for feature selection
Kung-Jeng Wang ... Melani-Adrian Angelia
Knowledge-Based Systems | VOL. 71
Kung-Jeng Wang, et. al.Kung-Jeng Wang ... Melani-Adrian Angelia
12 Aug 2014
Knowledge-Based Systems | VOL. 71

A Novel Method of Feature Selection based on SVM
Quanjin Liu ... Zhimin Zhao
Journal of Computers | VOL. 8
Quanjin Liu, et. al.Quanjin Liu ... Zhimin Zhao
08 Jan 2013
Journal of Computers | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A robust and accurate method for feature selection and prioritization from multi-class OMICs data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE