Abstract

Diploid organisms have two copies of each gene, called alleles, that can be separately transcribed. The RNA abundance associated to any particular allele is known as allele-specific expression (ASE). When two alleles have polymorphisms in transcribed regions, ASE can be studied using RNA-seq read count data. ASE has characteristics different from the regular RNA-seq expression: ASE cannot be assessed for every gene, measures of ASE can be biased towards one of the alleles (reference allele), and ASE provides two measures of expression for a single gene for each biological samples with leads to additional complications for single-gene models. We present statistical methods for modeling ASE and detecting genes with differential allelic expression. We propose a hierarchical, overdispersed, count regression model to deal with ASE counts. The model accommodates gene-specific overdispersion, has an internal measure of the reference allele bias, and uses random effects to model the gene-specific regression parameters. Fully Bayesian inference is obtained using the fbseq package that implements a parallel strategy to make the computational times reasonable. Simulation and real data analysis suggest the proposed model is a practical and powerful tool for the study of differential ASE.

Highlights

  • Over the past decade, RNA-sequencing (RNA-seq) has been replacing microarray technology as the primary high-throughput method used to measure gene expression [1]

  • allele-specific expression (ASE) has characteristics different from the regular RNA-seq expression: ASE cannot be assessed for every gene, measures of ASE can be biased towards one of the alleles, and ASE provides two measures of expression for a single gene for each biological samples with leads to additional complications for single-gene models

  • We propose a hierarchical, overdispersed, count regression model to deal with ASE counts

Read more

Summary

Introduction

RNA-sequencing (RNA-seq) has been replacing microarray technology as the primary high-throughput method used to measure gene expression [1]. Given the total ASE, i.e., the sum of counts in both alleles, the so-called reference allele count can be modeled as binomially distributed [5], or use Beta-binomial distribution which includes gene-specific overdispersion [6,7,8,9]. Instead of modeling ASE counts based on a binomial distribution, it is possible to adapt models originally designed for dealing with total RNA-seq transcript abundance counts, Poisson [4], generalized Poisson [10, 11] and negative binomial distributions [12] has been proposed. A hierarchical overdispersed count regression model is proposed to study allele-specific expression This modeling framework allows easy generalization to include additional genotypes, tissue types, and additional alleles. 6 presents a summary of the main findings and comments on the steps in this line of research

Allele-specific expression
Hierarchical overdispersed count regression model
Data model
Gene-specific hierarchical structure
Prior distributions
GPU-accelerated MCMC
Detecting differential allelic expression
Simulation study
Model to simulate data
Simulation scenarios
Statistical analysis of simulated data
ASE in maize experiment
Findings
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.