Abstract

BackgroundDecreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. However, rearrangements that may exist between the query and reference genomes may result in incorrect scaffolds, if these rearrangements are not taken into account. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome.ResultsWe present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of sis, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that sis has overall better performance.Conclusionssis is a new easy-to-use tool to generate contig scaffolds, available both as stand-alone and as a web server. The good performance of sis in our tests adds evidence that large-scale inversions are widespread in prokaryotic genomes.

Highlights

  • Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common

  • In this work we present an algorithm for obtaining contig scaffolds that explicitly takes into consideration the presence of inversions in A with respect to B

  • The algorithm we have developed generates correct scaffolds in the presence of symmetric, nested, or safe inversions; in addition it can produce scaffolds, not necessarily 100% correct, for generic inversions

Read more

Summary

Introduction

Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. With the decreasing costs of DNA sequencing it is very common for prokaryotic genomes to be sequenced at “draft” status only. This means that the generated sequence will be a set of contigs (a contig is a substring of the string over the DNA alphabet that represents the genome sequence). As of December 9, 2011, the number of draft microbial genome sequences in GenBanka is 2324, compared to 1814 complete sequences. There is an increasing need for tools that can improve the sequencing and assembly results beyond a simple contig set

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call