Abstract
mRNA-Seq is a precise and highly reproducible technique for measurement of transcripts levels and yields sequence information of a transcriptome at a single nucleotide base-level thus enabling us to determine splice junctions and alternative splicing events with high confidence. Often analysis of mRNA-Seq data does not attempt to quantify the expressions at isoform level. In this paper our objective would be use the mRNA-Seq data to infer expression at isoform level, where splicing patterns of a gene is assumed to be known. A Bayesian latent variable based modeling framework is proposed here, where the parameterization enables us to infer at various levels. For example, expression variability of an isoform across different conditions; the model parameterization also allows us to carry out two-sample comparisons, e.g., using a Bayesian t-test, in addition simple presence or absence of an isoform can also be estimated by the use of the latent variables present in the model. In this paper we would carry out inference on isoform expression under different normalization techniques, since it has been recently shown that one of the most prominent sources of variation in differential call using mRNA-Seq data is the normalization method used. The statistical framework is developed for multiple isoforms and easily extends to reads mapping to multiple genes. This could be achieved by slight conceptual modifications in definitions of what we consider as a gene and what as an exon. Additionally proposed framework can be extended by appropriate modeling of the design matrix to infer about yet unknown novel transcripts. However such attempts should be made judiciously since the input date used in the proposed model does not use reads from splice junctions.
Highlights
Sequencing technology has advanced at a rapid rate in the past decade
The results presented in the subsequent sections will be based on data from Arabidopsis chromosome-1 only for limitation of space
We have presented a Bayesian framework based on mRNA-Seq data to infer expression at isoform level
Summary
BACKGROUND Sequencing technology has advanced at a rapid rate in the past decade. The advent of massive parallel sequencing technologies, such as Illumina Genome Analyzer/Solexa, has revolutionized the genome-wide transcriptome studies leading to multiple applications. In this paper our primary objective would be use the mRNASeq data to infer expression at transcript, i.e., isoform level, while utilizing all existing information on splice variants.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have