Abstract
BackgroundGenome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows.ResultsWe used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations.ConclusionsBayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize.
Highlights
Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits
In this study we present a gene set approach based on the Bayesian variable selection method, known as Stochastic Search Variable Selection (SSVS) [11]
Analysis of Variance (ANOVA)-based testing of one gene set at a time was used as the reference method in comparison to the Bayesian variable selection method described in detail below
Summary
Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. GSEA can be implemented in a manner similar to a linear regression modeling approach that consists of three components: the incidence matrix linking genes to the gene set; the per-gene statistic vector, e.g., the t-statistic, and a per-set summing function. In this way, a large number of gene sets and overlapping gene sets can be viewed as a linear regression with a large number of highly collinear regression variables. A large number of gene sets and overlapping gene sets can be viewed as a linear regression with a large number of highly collinear regression variables This is a typical combinatorial and model selection problem. This becomes computationally demanding as the number of gene sets increases
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.