Meta-Statistics for Variable Selection: TheRPackageBioMark

Ron Wehrens,Pietro Franceschi

doi:10.18637/jss.v051.i10

Abstract

Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing α cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically. We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of α = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Statistical Software	Publication Date: Jan 1, 2012
Citations: 24	License type: cc-by

R Discovery Prime

R Discovery Prime

Meta-Statistics for Variable Selection: TheRPackageBioMark

Abstract

Talk to us

Similar Papers

More From: Journal of Statistical Software

Lead the way for us

Similar Papers

SIMAC (Sequential Elution from IMAC), a Phosphoproteomics Strategy for the Rapid Separation of Monophosphorylated from Multiply Phosphorylated Peptides
Tine E Thingholm ... Martin R Larsen
Molecular & Cellular Proteomics | VOL. 7
Tine E Thingholm, et. al.Tine E Thingholm ... Martin R Larsen
01 Apr 2008
Molecular & Cellular Proteomics | VOL. 7

Normalization and Statistical Analysis of Quantitative Proteomics Data Generated by Metabolic Labeling
Lily Ting ... Ricardo Cavicchioli
Molecular & Cellular Proteomics | VOL. 8
Lily Ting, et. al.Lily Ting ... Ricardo Cavicchioli
01 Oct 2009
Molecular & Cellular Proteomics | VOL. 8

Profiling the ‘deamidome’ of complex biosamples using mixed-mode chromatography-coupled tandem mass spectrometry
Siu Kwan Sze ... Sofong Cam Ngan
Methods | VOL. 200
Siu Kwan Sze, et. al.Siu Kwan Sze ... Sofong Cam Ngan
08 May 2020
Methods | VOL. 200

Capillary Electrophoresis in Metabolomics.
Tanja Verena Maier ... Philippe Schmitt-Kopplin
Methods in molecular biology (Clifton, N.J.) | VOL. 1483
Tanja Verena Maier, et. al.Tanja Verena Maier ... Philippe Schmitt-Kopplin
01 Jan 2015
Methods in molecular biology (Clifton, N.J.) | VOL. 1483

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Meta-Statistics for Variable Selection: TheRPackageBioMark

Abstract

Talk to us

Similar Papers

More From: Journal of Statistical Software