Abstract

BackgroundRecently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed.ResultsFor continuous or ordinal phenotypic outcome, we propose to use as the local statistic the coefficient of multiple determination (i.e., the square of multiple correlation coefficient) R2 from fitting natural cubic spline models to the phenotype-expression relationship. Next, we incorporate this association measure into the GSEA/SAFE framework to identify significant gene sets. Unsigned local statistics, signed global statistics and one-sided p-values are used to reflect our inferential interest. Furthermore, we describe a procedure for inference across multiple GSEA/SAFE analyses. We illustrate our approach using gene expression and liver injury data from liver and blood samples from rats treated with eight hepatotoxicants under multiple time and dose combinations. We set out to identify biological pathways/processes associated with liver injury as manifested by increased blood levels of alanine transaminase in common for most of the eight compounds. Potential statistical dependency resulting from the experimental design is addressed in permutation based hypothesis testing.ConclusionThe proposed framework captures both linear and non-linear association between gene expression level and a phenotypic endpoint and thus can be viewed as extending the current GSEA/SAFE methodology. The framework for combining results from multiple GSEA/SAFE analyses is flexible to address practical inference interests. Our methods can be applied to microarray data with continuous phenotypes with multi-level design or the meta-analysis of multiple microarray data sets.

Highlights

  • Microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint

  • 2.1 Non-linear association measurement In all methods for the evaluation of gene sets discussed in the introduction, the association measure between a gene set and an endpoint is built from the association measures between individual genes and the endpoint

  • We propose to use the coefficient of multiple determination R2 in natural cubic spline regression models to measure the gene specific

Read more

Summary

Methodology article

Gene set enrichment analysis for non-monotone association and multiple experimental categories. Rongheng Lin*1, Shuangshuang Dai, Richard D Irwin, Alexandra N Heinloth, Gary A Boorman and Leping Li*1. Address: 1Biostatistics Branch, National Institute of Environmental Health Science, Research Triangle Park, NC 27713, USA, 2Alpha-Gamma Technologies, Inc., Raleigh NC 27609, USA, 3Environmental Toxicology Program, National Institute of Environmental Health Science, Research Triangle Park, NC 27713, USA, 4Laboratory of Molecular Toxicology, National Institute of Environmental Health Science, Research Triangle Park, NC 27713, USA and 5Covance Inc., Vienna, VA 22066, USA. Published: 14 November 2008 BMC Bioinformatics 2008, 9:481 doi:10.1186/1471-2105-9-481

Results
Conclusion
Background
Results and discussion
Hrs 24 Hrs 48 Hrs Replicates Array totals
Analysis results
15. Smyth GK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.