Abstract
Diploid organisms have two copies of each gene, called alleles, that can be separately transcribed. The RNA abundance associated to any particular allele is known as allele-specific expression (ASE). When two alleles have polymorphisms in transcribed regions, ASE can be studied using RNA-seq read count data. ASE has characteristics different from the regular RNA-seq expression: ASE cannot be assessed for every gene, measures of ASE can be biased towards one of the alleles (reference allele), and ASE provides two measures of expression for a single gene for each biological samples with leads to additional complications for single-gene models. We present statistical methods for modeling ASE and detecting genes with differential allelic expression. We propose a hierarchical, overdispersed, count regression model to deal with ASE counts. The model accommodates gene-specific overdispersion, has an internal measure of the reference allele bias, and uses random effects to model the gene-specific regression parameters. Fully Bayesian inference is obtained using the fbseq package that implements a parallel strategy to make the computational times reasonable. Simulation and real data analysis suggest the proposed model is a practical and powerful tool for the study of differential ASE.
Highlights
Over the past decade, RNA-sequencing (RNA-seq) has been replacing microarray technology as the primary high-throughput method used to measure gene expression [1]
allele-specific expression (ASE) has characteristics different from the regular RNA-seq expression: ASE cannot be assessed for every gene, measures of ASE can be biased towards one of the alleles, and ASE provides two measures of expression for a single gene for each biological samples with leads to additional complications for single-gene models
We propose a hierarchical, overdispersed, count regression model to deal with ASE counts
Summary
RNA-sequencing (RNA-seq) has been replacing microarray technology as the primary high-throughput method used to measure gene expression [1]. Given the total ASE, i.e., the sum of counts in both alleles, the so-called reference allele count can be modeled as binomially distributed [5], or use Beta-binomial distribution which includes gene-specific overdispersion [6,7,8,9]. Instead of modeling ASE counts based on a binomial distribution, it is possible to adapt models originally designed for dealing with total RNA-seq transcript abundance counts, Poisson [4], generalized Poisson [10, 11] and negative binomial distributions [12] has been proposed. A hierarchical overdispersed count regression model is proposed to study allele-specific expression This modeling framework allows easy generalization to include additional genotypes, tissue types, and additional alleles. 6 presents a summary of the main findings and comments on the steps in this line of research
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.