Abstract
BackgroundHigh-throughput sequencing experiments, which can determine allele origins, have been used to assess genome-wide allele-specific expression. Despite the amount of data generated from high-throughput experiments, statistical methods are often too simplistic to understand the complexity of gene expression. Specifically, existing methods do not test allele-specific expression (ASE) of a gene as a whole and variation in ASE within a gene across exons separately and simultaneously.ResultsWe propose a generalized linear mixed model to close these gaps, incorporating variations due to genes, single nucleotide polymorphisms (SNPs), and biological replicates. To improve reliability of statistical inferences, we assign priors on each effect in the model so that information is shared across genes in the entire genome. We utilize Bayesian model selection to test the hypothesis of ASE for each gene and variations across SNPs within a gene. We apply our method to four tissue types in a bovine study to de novo detect ASE genes in the bovine genome, and uncover intriguing predictions of regulatory ASEs across gene exons and across tissue types. We compared our method to competing approaches through simulation studies that mimicked the real datasets. The R package, BLMRM, that implements our proposed algorithm, is publicly available for download at https://github.com/JingXieMIZZOU/BLMRM.ConclusionsWe will show that the proposed method exhibits improved control of the false discovery rate and improved power over existing methods when SNP variation and biological variation are present. Besides, our method also maintains low computational requirements that allows for whole genome analysis.
Highlights
High-throughput sequencing experiments, which can determine allele origins, have been used to assess genome-wide allele-specific expression
Several common congenital human disorders are caused by mutations or deletions within these allele-specific expression (ASE) regions, such as Beckwith-Wiedemann syndrome (BWS) [8, 9], which characterizes an array of congenital overgrowth phenotypes; Angelman syndrome [10], which characterizes nervous system disorders; and Prader-Willi syndrome, in which infants suffer from hyperphagia and obesity
We propose a Bayesian logistic mixed regression model that accounts for variations from genes, single nucleotide polymorphism (SNP), and biological replicates
Summary
High-throughput sequencing experiments, which can determine allele origins, have been used to assess genome-wide allele-specific expression. Research has uncovered a group of genes in the genome where two copies of a gene express substantially differently, a phenomenon known as allelic imbalance. One such example involves imprinted genes whose allele expression is based on the parent of origin [1, 2]; that is, imprinted genes are mainly or completely. [13] fits a mixture of folded Skellam distributions to the absolute values of read differences between two alleles These abovementioned statistical methods draw conclusions based on observations produced from one gene; due to the expensive cost of acquiring tissue samples and sequencing experiments, most laboratories can only afford three or four biological replicates. Genes may have low read counts, limiting the power of the aforementioned methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.