Abstract

BackgroundGene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses.ResultsWe propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA).ConclusionWe develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

Highlights

  • Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes

  • We propose in this paper to test for the effect of a gene set using a variance component test in multivariate regression model, where the correlation among genes in a gene set is explicitly taken into account

  • Simulation study Single gene set Four true covariances were considered in the simulations: compound symmetry, AR1, two factor, and unstructured covariance

Read more

Summary

Introduction

Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. Most of the existing gene set analysis methods do not fully account for the correlation among the genes. Genome-wide analysis using microarray data, including RNA expression, DNA copy number and epigenetic DNA methylation, has become a popular tool in genomic research. Microarray gene expressions or genetic markers usually have natural groupings based on biological knowledge. Note that the grouping may not necessarily come from biology It can be a cluster of genes identified using clustering methods. In this paper, these natural or statistical groupings are loosely called a gene set, which refers to a set of genes, or a set of markers or a set of probes

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.