A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data.

Sing-Hoi Sze,Aaron M Tarone

doi:10.1186/1471-2164-15-s5-s6

Abstract

BackgroundThe recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph.ResultsSince the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS.ConclusionsSince our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements.

Highlights

The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms
We develop an algorithm to remove the complicated cycles in the de Bruijn graph, and extract acyclic components so that each of them represents a gene and its isoforms in almost all cases
De Bruijn graph Given a set of reads and a parameter k, a de Bruijn graph is defined by constructing a vertex for each k-mer that appears within the reads

Summary

Introduction

The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph. With the advance of high-throughput sequencing techniques, it is feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms [1,2,3,4,5,6,7,8]. An additional step is performed to construct predicted transcripts from the graph This strategy is employed by Oases [10] and. Trinity [8] uses a different approach of first clustering the data, constructing an individual de Bruijn graph for each cluster that has simple structure

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jul 1, 2014
Citations: 20	License type: cc-by

R Discovery Prime

R Discovery Prime

A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de bruijn graphs of RNA-seq data
Sing-Hoi Sze ... Aaron M Tarone
-
Sing-Hoi Sze, et. al.Sing-Hoi Sze ... Aaron M Tarone
01 Jun 2013
01 Jun 2013

Identifying similar transcripts in a related organism from de Bruijn graphs of RNA-Seq data, with applications to the study of salt and waterlogging tolerance in melilotus
...
-
, et. al. ...
01 Oct 2017
01 Oct 2017

Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads
Leandro Lima ... Vincent Lacroix
Algorithms for Molecular Biology | VOL. 12
Leandro Lima, et. al.Leandro Lima ... Vincent Lacroix
22 Feb 2017
Algorithms for Molecular Biology | VOL. 12

Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data
Shuhua Fu ... Aaron M Tarone
-
Shuhua Fu, et. al. Shuhua Fu ... Aaron M Tarone
01 Jun 2014
01 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics