Abstract

We present a new de novo transcriptome assembler, Bridger, which takes advantage of techniques employed in Cufflinks to overcome limitations of the existing de novo assemblers. When tested on dog, human, and mouse RNA-seq data, Bridger assembled more full-length reference transcripts while reporting considerably fewer candidate transcripts, hence greatly reducing false positive transcripts in comparison with the state-of-the-art assemblers. It runs substantially faster and requires much less memory space than most assemblers. More interestingly, Bridger reaches a comparable level of sensitivity and accuracy with Cufflinks. Bridger is available at https://sourceforge.net/projects/rnaseqassembly/files/?source=navbar.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0596-2) contains supplementary material, which is available to authorized users.

Highlights

  • RNA sequencing (RNA-seq) is a powerful technique for collecting geneexpression data at a whole transcriptome level with unprecedented sensitivity and accuracy [1,2,3,4]

  • Shortread genome assemblers, such as Velvet [7], ABySS [8], and ALLPATHS [9], cannot be directly applied to transcriptome assembly, due to the following reasons: (1) DNA sequencing depth is expected to be the same across a genome while the depths of the sequenced transcripts may vary by several orders of magnitude [10]; and (2) due to alternative splicing, a transcriptome-assembly problem is more complex than a linear problem as in the case of genome assembly, generally requiring a graph to represent

  • The human data (Accession Codes: SRX011545 and SRX011546) were collected on human CD4 T cells [31], with 50 million paired-end reads of length 45 bp with an insert size of 200 bp, which we downloaded from the DNA Data Bank of Japan (DDBJ) Sequence read archive (SRA) database

Read more

Summary

Introduction

RNA-seq is a powerful technique for collecting geneexpression data at a whole transcriptome level with unprecedented sensitivity and accuracy [1,2,3,4]. Shortread genome assemblers, such as Velvet [7], ABySS [8], and ALLPATHS [9], cannot be directly applied to transcriptome assembly, due to the following reasons: (1) DNA sequencing depth is expected to be the same across a genome while the depths of the sequenced transcripts may vary by several orders of magnitude [10]; and (2) due to alternative splicing, a transcriptome-assembly problem is more complex than a linear problem as in the case of genome assembly, generally requiring a graph to represent. A number of RNA-seq based transcriptome assemblers have been developed in the past few years. They fall into two general categories: reference-based and de novo assembly approaches [10,11]. Fulllength splicing isoforms are recovered by traversing the graph. This strategy is used only when a high-quality reference genome is available

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call