MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

Jonas Behr,Gunnar Rätsch,André Kahles,Vipin T Sreedharan,Philipp Drewe,Yi Zhong

doi:10.1093/bioinformatics/btt442

Abstract

Motivation: High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction.Results: We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction.Availability: MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.Contact: Jonas_Behr@web.de and raetsch@cbio.mskcc.orgSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

Most of the complexity of higher eukaryotic transcriptomes can be attributed to the encoding of multiple transcripts at a single genic locus by means of alternative splicing, transcription start and termination (e.g. Nilsen and Graveley, 2010; Ratsch et al, 2007; Schweikert et al, 2009)
The optimization problem formalized by MITIE generalizes to solve transcript prediction in the de novo setting, and we show in Section 4 that the MITIE strategy is superior to the dynamic programming-based strategy of Trinity
MITIE can build a segment graph based on given alignments of RNASeq reads to a genome or start with segment graphs obtained by other means, in particular by de novo assembly

Summary

Introduction

Most of the complexity of higher eukaryotic transcriptomes can be attributed to the encoding of multiple transcripts at a single genic locus by means of alternative splicing, transcription start and termination (e.g. Nilsen and Graveley, 2010; Ratsch et al, 2007; Schweikert et al, 2009). Alignment tools for RNA-Seq reads, such as PALMapper (De Bona et al, 2008; Jean et al, 2010), TopHat (Trapnell et al, 2009), MapSplice (Wang et al, 2010), Star (Dobin et al, 2012) or Gsnap (Wu and Nacu, 2010) are typically able to identify new exon–exon junctions, which are candidates for introns. This information can be compiled into a segment or splicing graph, a directed acyclic graph, where the nodes correspond to exonic segments and the edges correspond to intron candidates (cf Fig. 1 for an illustration). We will focus on genome-based transcript reconstruction when describing the approach and discuss de novo assembly whenever necessary

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer applications in the biosciences : CABIOS	Publication Date: Aug 25, 2013
Citations: 78	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer applications in the biosciences : CABIOS

Lead the way for us

Similar Papers

Network-Based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis.
Wei Zhang ... Xianghong Jasmine Zhou
PLoS computational biology | VOL. 11
Wei Zhang, et. al.Wei Zhang ... Xianghong Jasmine Zhou
23 Dec 2015
PLoS computational biology | VOL. 11

Multi-Objective Portfolio Optimization by Mixed Integer Programming
Bartosz Sawik
SSRN | VOL. -
Bartosz SawikBartosz Sawik
24 Jun 2016
SSRN | VOL. -

Graph Matching and Link Analysis for Dynamic Planning and Execution
John A Beyerle ...
-
John A Beyerle, et. al.John A Beyerle ...
01 Sep 2002
01 Sep 2002

A comparative analysis of linear fitting for non-linear functions on optimization: A case study: Air pollution problems
Laureano F Escudero
European Journal of Operational Research | VOL. 2
Laureano F EscuderoLaureano F Escudero
01 Nov 1978
European Journal of Operational Research | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer applications in the biosciences : CABIOS