De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application

Wiktor Kuśmirek,Robert Nowak

doi:10.1186/s12859-018-2281-4

Wiktor Kuśmirek, Robert Nowak

Open Access

https://doi.org/10.1186/s12859-018-2281-4

Copy DOI

Abstract

BackgroundMany organisms, in particular bacteria, contain repetitive DNA fragments called tandem repeats. These structures are restored by DNA assemblers by mapping paired-end tags to unitigs, estimating the distance between them and filling the gap with the specified DNA motif, which could be repeated many times. However, some of the tandem repeats are longer than the distance between the paired-end tags.ResultsWe present a new algorithm for de novo DNA assembly, which uses the relative frequency of reads to properly restore tandem repeats. The main advantage of the presented algorithm is that long tandem repeats, which are much longer than maximum reads length and the insert size of paired-end tags can be properly restored. Moreover, repetitive DNA regions covered only by single-read sequencing data could also be restored. Other existing de novo DNA assemblers fail in such cases.The presented application is composed of several steps, including: (i) building the de Bruijn graph, (ii) correcting the de Bruijn graph, (iii) normalizing edge weights, and (iv) generating the output set of DNA sequences.We tested our approach on real data sets of bacterial organisms.ConclusionsThe software library, console application and web application were developed. Web application was developed in client-server architecture, where web-browser is used to communicate with end-user and algorithms are implemented in C++ and Python. The presented approach enables proper reconstruction of tandem repeats, which are longer than the insert size of paired-end tags. The application is freely available to all users under GNU Library or Lesser General Public License version 3.0 (LGPLv3).

Highlights

ResultsWe present a new algorithm for de novo DNA assembly, which uses the relative frequency of reads to properly restore tandem repeats
Many organisms, in particular bacteria, contain repetitive DNA fragments called tandem repeats
In many cases the incompleteness is a result of the occurrence of repetitive sequences in bacterial genomes that can not always be reconstructed from short DNA reads from secondgeneration sequencing

Summary

Results

We presented the results of tests for real data sets of bacterial organisms. We compared the results obtained by our approach with tandem repeats detected by algorithms based on paired-end tags. Simulated dataset for different depth of coverage In this experiment we checked how read coverage affects the tandem repeats detection for different types of repetitive sequences - we compared efficiency of reconstructing tandem repeats by our approach and by methods based on paired-end tags on simulated datasets generated with another depth of coverage. Despite the small size of this sequence (only 13,900 bp), there is a large repetitive DNA region (tandem repeats), which contains 13 repeats of the same 31-nt sequence [15] To assemble this sequence, we obtained reads from the Illumina sequencer, the reads were paired (2x100 bp), an average insert size was equal to 300 bp. Additional ultradeep sequencing of PCR amplicons for this DNA region confirmed the results obtained by our approach

Conclusions

Background

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 18, 2018
Citations: 17	License type: open-access

R Discovery Prime

R Discovery Prime

De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

One-way sequencing of multiple amplicons from tandem repetitive mitochondrial DNA control region
Jiawu Xu ... Dina M Fonseca
Mitochondrial DNA | VOL. 22
Jiawu Xu, et. al.Jiawu Xu ... Dina M Fonseca
01 Oct 2011
Mitochondrial DNA | VOL. 22

New tandem repeat region in the non-transcribed spacer of human ribosomal RNA gene.
Geza Safrany ... Egon J Hidvegi
Nucleic acids research | VOL. 17
Geza Safrany, et. al.Geza Safrany ... Egon J Hidvegi
01 Jan 1989
Nucleic acids research | VOL. 17

Manifold de Bruijn Graphs
Yu Lin ... Pavel A Pevzner
-
Yu Lin, et. al.Yu Lin ... Pavel A Pevzner
01 Jan 2014
01 Jan 2014

An alternative view of mammalian DNA sequence organization: II. Short repetitive sequences are organized into scrambled tandem clusters in Syrian hamster DNA
Robert K Moyzis ... Paul O.P Ts'O
Journal of Molecular Biology | VOL. 153
Robert K Moyzis, et. al.Robert K Moyzis ... Paul O.P Ts'O
01 Dec 1981
Journal of Molecular Biology | VOL. 153

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics