Abstract

BackgroundThe development of techniques for sequencing the messenger RNA (RNA-Seq) enables it to study the biological mechanisms such as alternative splicing and gene expression regulation more deeply and accurately. Most existing methods employ RNA-Seq to quantify the expression levels of already annotated isoforms from the reference genome. However, the current reference genome is very incomplete due to the complexity of the transcriptome which hiders the comprehensive investigation of transcriptome using RNA-Seq. Novel study on isoform inference and estimation purely from RNA-Seq without annotation information is desirable.ResultsA Nonnegativity and Sparsity constrained Maximum APosteriori (NSMAP) model has been proposed to estimate the expression levels of isoforms from RNA-Seq data without the annotation information. In contrast to previous methods, NSMAP performs identification of the structures of expressed isoforms and estimation of the expression levels of those expressed isoforms simultaneously, which enables better identification of isoforms. In the simulations parameterized by two real RNA-Seq data sets, more than 77% expressed isoforms are correctly identified and quantified. Then, we apply NSMAP on two RNA-Seq data sets of myelodysplastic syndromes (MDS) samples and one normal sample in order to identify differentially expressed known and novel isoforms in MDS disease.ConclusionsNSMAP provides a good strategy to identify and quantify novel isoforms without the knowledge of annotated reference genome which can further realize the potential of RNA-Seq technique in transcriptome analysis. NSMAP package is freely available at https://sites.google.com/site/nsmapforrnaseq.

Highlights

  • The development of techniques for sequencing the messenger RNA (RNA-Seq) enables it to study the biological mechanisms such as alternative splicing and gene expression regulation more deeply and accurately

  • We argue that the determination of parsimonious set of expressed transcripts and expression level estimation should be implemented jointly

  • We apply Nonnegativity and Sparsity constrained Maximum APosteriori (NSMAP) on three real in-house RNA-Seq data sets of myelodysplastic syndromes (MDS) transcriptome analysis to identify isoforms including novel ones featured in MDS disease

Read more

Summary

Introduction

The development of techniques for sequencing the messenger RNA (RNA-Seq) enables it to study the biological mechanisms such as alternative splicing and gene expression regulation more deeply and accurately. Most existing methods employ RNA-Seq to quantify the expression levels of already annotated isoforms from the reference genome. Bohnert et al [15] proposed rQuant to determine the abundances for each annotated isoforms by minimizing the deviation of the observation from the expected position-wide read coverage All these methods assumed that the number and structures of isoforms of each gene are known from the reference genome. As Jiang and Wong [14] pointed out, the isoform level annotation is very incomplete due to the complexity of the transcriptome and the limitations of previous experimental approaches To address this issue, Trapnell et al proposed Cufflinks [16] to identify transcripts as well as to estimate the expression levels of identified transcripts from mapped reads without annotation information.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.