Evaluation of multiple variate selection methods from a biological perspective: a nutrigenomics case study

Henri S Tapp,Uwe Thissen,E Kate Kemsley,Marijana Radonjic

doi:10.1007/s12263-012-0288-4

Henri S Tapp, Uwe Thissen + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/s12263-012-0288-4

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Genomics-based technologies produce large amounts of data. To interpret the results and identify the most important variates related to phenotypes of interest, various multivariate regression and variate selection methods are used. Although inspected for statistical performance, the relevance of multivariate models in interpreting biological data sets often remains elusive. We compare various multivariate regression and variate selection methods applied to a nutrigenomics data set in terms of performance, utility and biological interpretability. The studied data set comprised hepatic transcriptome (10,072 predictor variates) and plasma protein concentrations [2 dependent variates: Leptin (LEP) and Tissue inhibitor of metalloproteinase 1 (TIMP-1)] collected during a high-fat diet study in ApoE3Leiden mice. The multivariate regression methods used were: partial least squares “PLS”; a genetic algorithm-based multiple linear regression, “GA-MLR”; two least-angle shrinkage methods, “LASSO” and “ELASTIC NET”; and a variant of PLS that uses covariance-based variate selection, “CovProc.” Two methods of ranking the genes for Gene Set Enrichment Analysis (GSEA) were also investigated: either by their correlation with the protein data or by the stability of the PLS regression coefficients. The regression methods performed similarly, with CovProc and GA performing the best and worst, respectively (R-squared values based on “double cross-validation” predictions of 0.762 and 0.451 for LEP; and 0.701 and 0.482 for TIMP-1). CovProc, LASSO and ELASTIC NET all produced parsimonious regression models and consistently identified small subsets of variates, with high commonality between the methods. Comparison of the gene ranking approaches found a high degree of agreement, with PLS-based ranking finding fewer significant gene sets. We recommend the use of CovProc for variate selection, in tandem with univariate methods, and the use of correlation-based ranking for GSEA-like pathway analysis methods.Electronic supplementary materialThe online version of this article (doi:10.1007/s12263-012-0288-4) contains supplementary material, which is available to authorized users.

Highlights

In many life science studies, large data sets are generated from metabolomics, proteomics and transcriptomics experiments
An ideal variate selection method has principles and parameters that are well-suited to the particular study goal and/or to the data characteristics, it is not always straightforward to make these choices in advance
This study has compared five methods currently used for variate selection or ranking: Partial least squares (PLS), Genetic algorithm (GA), Least absolute shrinkage and selection operator (LASSO)/ELASTIC NET and CovProc

Summary

Introduction

In many life science studies, large data sets are generated from metabolomics, proteomics and transcriptomics experiments. Genes Nutr (2012) 7:387–397 biomarkers or crucial pathways associated with the original study goal. Statistical models are generated that describe the relationship between the genomics data and some feature of interest (e.g., a phenotype). Many variate selection methods are described in the literature. These can differ in their implementation details or in their fundamental statistical principles (Guyon and Elisseeff 2003; Guyon et al 2006). An ideal variate selection method has principles and parameters that are well-suited to the particular study goal and/or to the data characteristics, it is not always straightforward to make these choices in advance. Even though the statistical principles of a method may be understood, its utility from a biological perspective is often less obvious

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genes & Nutrition	Publication Date: Mar 2, 2012
Citations: 5	License type: cc-by

R Discovery Prime

Evaluation of multiple variate selection methods from a biological perspective: a nutrigenomics case study

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Genes & Nutrition

Lead the way for us

Similar Papers

Multivariate (O)PLS regression methods in forensic dating
L Ortiz-Herrero ... L Bartolomé
TrAC Trends in Analytical Chemistry | VOL. 141
L Ortiz-Herrero, et. al.L Ortiz-Herrero ... L Bartolomé
29 Mar 2021
TrAC Trends in Analytical Chemistry | VOL. 141

Comparative study for multi-variable regression methods based on Laguerre polynomial and manifolds optimization
Zibo Li ... Dequn Zhao
Engineering Computations | VOL. 39
Zibo Li, et. al.Zibo Li ... Dequn Zhao
16 Aug 2022
Engineering Computations | VOL. 39

Exploring multivariate linear regression methods for the prediction of total phenolic content in standard American lager beers using synchronous fluorescence spectroscopy fused data
Alexander De P Rodrigues ... Aderval S Luna
Chemometrics and Intelligent Laboratory Systems | VOL. 206
Alexander De P Rodrigues, et. al.Alexander De P Rodrigues ... Aderval S Luna
22 Sep 2020
Chemometrics and Intelligent Laboratory Systems | VOL. 206

Semi-blind source separation for the estimation of the clay content over semi-vegetated areas using VNIR/SWIR hyperspectral airborne data
W Ouerghemmi ... C Gomez
Remote Sensing of Environment | VOL. 181
W Ouerghemmi, et. al.W Ouerghemmi ... C Gomez
30 Apr 2016
Remote Sensing of Environment | VOL. 181

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Evaluation of multiple variate selection methods from a biological perspective: a nutrigenomics case study

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Genes &amp; Nutrition

More From: Genes & Nutrition