Readjoiner: a fast and memory efficient string graph-based sequence assembler

Giorgio Gonnella,Stefan Kurtz

doi:10.1186/1471-2105-13-82

Abstract

BackgroundOngoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads.ResultsHere we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only.ConclusionsOur suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.

Highlights

Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers
Sequence analysis software tools developed only a few years ago are often unable to deal with such large amounts of short reads: This has led to a gap between the
The presented methods for constructing the string graph and the subsequent computation of contigs have been implemented in a sequence assembler named Readjoiner, which is part of the GenomeTools software suite [21]

Summary

Introduction

Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads. The de novo sequence assembly problem is to reconstruct a target sequence from a set of sequence reads. The classical approach to de novo assembly consists of three phases: overlap, layout and consensus. Suffix-prefix matches among all pairs of sequence reads are computed, and turned into an overlap graph [1]. In the consensus phase the target sequence is reconstructed, by selecting a base for each position. Sequence analysis software tools developed only a few years ago are often unable to deal with such large amounts of short reads: This has led to a gap between the

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 6, 2012
Citations: 79	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Readjoiner: a fast and memory efficient string graph-based sequence assembler

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Simulating the DNA Overlap Graph in Succinct Space.
...
-
, et. al. ...
01 Jan 2019
01 Jan 2019

Integration of string and de Bruijn graphs for genome assembly.
Yao-Ting Huang ... Chen-Fu Liao
Bioinformatics | VOL. 32
Yao-Ting Huang, et. al.Yao-Ting Huang ... Chen-Fu Liao
10 Jan 2016
Bioinformatics | VOL. 32

SOF: An Efficient String Graph Construction Algorithm
S M Iqbal Morshed ... Shibu Yooseph
-
S M Iqbal Morshed, et. al.S M Iqbal Morshed ... Shibu Yooseph
01 Nov 2019
01 Nov 2019

GAMS: Genome Assembly on Multi-GPU Using String Graph
Gaurav Jain ... Lalchand Rathore
-
Gaurav Jain, et. al.Gaurav Jain ... Lalchand Rathore
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Readjoiner: a fast and memory efficient string graph-based sequence assembler

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics