Abstract

BackgroundA draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown. To further obtain its complete sequence, a contig scaffolding process is usually applied to order and orient the contigs in the draft genome. Although several single reference-based scaffolding tools have been proposed, they may produce erroneous scaffolds if there are rearrangements between the target and reference genomes or their phylogenetic relationship is distant. This may suggest that a single reference genome may not be sufficient to produce correct scaffolds of a draft genome.ResultsIn this study, we design a simple heuristic method to further revise our single reference-based scaffolding tool CAR into a new one called Multi-CAR such that it can utilize multiple complete genomes of related organisms as references to more accurately order and orient the contigs of a draft genome. In practical usage, our Multi-CAR does not require prior knowledge concerning phylogenetic relationships among the draft and reference genomes and libraries of paired-end reads. To validate Multi-CAR, we have tested it on a real dataset composed of several prokaryotic genomes and also compared its accuracy performance with other multiple reference-based scaffolding tools Ragout and MeDuSa. Our experimental results have finally shown that Multi-CAR indeed outperforms Ragout and MeDuSa in terms of sensitivity, precision, genome coverage, scaffold number and scaffold N50 size.ConclusionsMulti-CAR serves as an efficient tool that can more accurately order and orient the contigs of a draft genome based on multiple reference genomes. The web server of Multi-CAR is freely available at http://genome.cs.nthu.edu.tw/Multi-CAR/.

Highlights

  • A draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown

  • To validate Multi-CAR, we have tested it on a real dataset composed of several prokaryotic genomes and compared its performance with Ragout and MeDuSa

  • Testing dataset For validation, we used a real dataset composed of several prokaryotic draft genomes to test Multi-CAR and compared its performance to Ragout [16] and MeDuSa [17] in terms of sensitivity, precision, genome coverage, scaffold number, scaffold N50 size and running time

Read more

Summary

Introduction

A draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown. As evaluated in our previous study [14], CAR we implemented based on a rearrangementbased algorithm [15] has a better performance among all these single reference-based scaffolding tools in terms of average sensitivity, precision and genome coverage All these single reference-based scaffolding tools may produce erroneous scaffolds (i.e., ordered and oriented contigs) if there are rearrangements between the target and reference genomes or their phylogenetic relationship is distant. This may suggest that a single reference genome may not be sufficient to produce correct scaffolds of a target draft genome

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call