MGMR: leveraging RNA-Seq population data to optimize expression estimation

Roye Rozov,Ron Shamir,Eran Halperin

doi:10.1186/1471-2105-13-s6-s2

Roye Rozov, Ron Shamir + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-13-s6-s2

Copy DOI

Abstract

BackgroundRNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samplesResultsIn order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes.ConclusionsWe provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level.

Highlights

With the rapid decline in the cost of sequencing, RNASeq has emerged as a legitimate competitor to mi-croarrays as a means of assessing global gene expression
We demonstrate that by analyzing expression profiles of a population together, one gets expression estimates more accurate than those obtained by estimating individual sample expression levels independently
The model describes an RNA sequencing experiment where regions in G are randomly chosen according to the distribution P, start positions in these regions are chosen uniformly, and reads of length l are generated by copying l consecutive bases from each chosen region to produce a set of reads R = (r1,..., rr)

Summary

Introduction

With the rapid decline in the cost of sequencing, RNASeq has emerged as a legitimate competitor to mi-croarrays as a means of assessing global gene expression. Even as arrays currently enjoy a cost advantage, many new applications of information accessible only through sequencing further strengthen the case that sequencing may soon supplant arrays as the technology of choice for transcription analysis One such application is finegrained assessment of variation in expression and the sources for such variation, as exemplified by recent large-scale RNA-Seq studies [1,2] of two different. Perhaps the most widely discussed hurdle to accurate estimation in the case of RNA-Seq is the problem of reads mapped to multiple locations in the target genome (or in the target transcript sequences). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples

Objectives

Methods

Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 19, 2012
Citations: 14	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

MGMR: leveraging RNA-Seq population data to optimize expression estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Dynamic and Temporal Transcriptomic Analysis Reveals Ferroptosis-Mediated Antileukemia Activity of S-Dimethylarsino-Glutathione: Insights into Novel Therapeutic Strategy
Xiaohan Xu ... Hongzhe Sun
CCS Chemistry | VOL. 4
Xiaohan Xu, et. al.Xiaohan Xu ... Hongzhe Sun
30 Apr 2021
CCS Chemistry | VOL. 4

InferPy: Probabilistic modeling with deep neural networks made easy
Javier Cózar ... Andrés R Masegosa
Neurocomputing | VOL. 415
Javier Cózar, et. al.Javier Cózar ... Andrés R Masegosa
08 Sep 2020
Neurocomputing | VOL. 415

Role of Gene Length in Control of Human Gene Expression: Chromosome-Specific and Tissue-Specific Effects.
Jay C. Brown ... Ernesto Picardi
International Journal of Genomics | VOL. 2021
Jay C. Brown, et. al.Jay C. Brown ... Ernesto Picardi
13 Feb 2021
International Journal of Genomics | VOL. 2021

Effect of chronic stress on the relative level of dopamine receptor gene expression
Elena V Valeeva ... Regina D Mukhametshina
Kazan medical journal | VOL. 103
Elena V Valeeva, et. al.Elena V Valeeva ... Regina D Mukhametshina
09 Jun 2022
Kazan medical journal | VOL. 103

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MGMR: leveraging RNA-Seq population data to optimize expression estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics