A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

Christin Christin,Huub C.J Hoefsloot,Age K Smilde,B Hoekman,Frank Suits,Rainer Bischoff,Peter Horvatovich

doi:10.1074/mcp.m112.022566

Christin Christin, Huub C.J Hoefsloot + Show 5 more

Open Access

PDF Available

https://doi.org/10.1074/mcp.m112.022566

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.

Highlights

We compare the performance of six different feature selection methods for liquid chromatography–mass spectrometry (LC-MS)-based proteomics and metabolomics biomarker discovery—t test, the Mann–Whitney–Wilcoxon test, nearest shrunken centroid (NSC), linear support vector machine– recursive features elimination (SVM-recursive feature elimination (RFE)), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)— using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels
The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features
Comparison between Methods— all methods benefit from a larger sample size, only some of them are affected by the between- and within-class variability of spiked peptides

Summary

Introduction

More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverneamendment. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. The success of biomarker discovery depends on several factors: consistent and reproducible phenotyping of the individuals from whom biological samples are obtained; the quality of the analytical methodology, which in turn determines the quality of the collected data; the accuracy of the computational methods used to extract quantitative and molecular identity information to define the biomarker candidates from raw analytical data; and the performance of the applied statistical methods in the selection of a limited list of compounds with the potential to discriminate between predefined classes of samples. The goal of subsequent data preprocessing and statistical analysis is to select a limited number of candidates, which are subsequently subjected to targeted analyses in large number of samples for validation

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecular & Cellular Proteomics	Publication Date: Jan 1, 2013
Citations: 132	License type: cc-by

R Discovery Prime

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Molecular & Cellular Proteomics

Lead the way for us

Similar Papers

A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression.
Tao Li ... Mitsunori Ogihara
Bioinformatics | VOL. 20
Tao Li, et. al.Tao Li ... Mitsunori Ogihara
15 Apr 2004
Bioinformatics | VOL. 20

Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods
Siow-Wee Chang ... Amir Feisal Merican
BMC Bioinformatics | VOL. 14
Siow-Wee Chang, et. al.Siow-Wee Chang ... Amir Feisal Merican
31 May 2013
BMC Bioinformatics | VOL. 14

Effect of finite sample size on feature selection and classification: A simulation study
Ted W Way ... Berkman Sahiner
Medical Physics | VOL. 37
Ted W Way, et. al.Ted W Way ... Berkman Sahiner
28 Jan 2010
Medical Physics | VOL. 37

Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery
Hae Woo Lee ... Young Jeong Na
Statistical Applications in Genetics and Molecular Biology | VOL. 12
Hae Woo Lee, et. al.Hae Woo Lee ... Young Jeong Na
13 Jan 2013
Statistical Applications in Genetics and Molecular Biology | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Molecular &amp; Cellular Proteomics

More From: Molecular & Cellular Proteomics