Abstract

BackgroundWe investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence.ResultsThe best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most.ConclusionsBAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.

Highlights

  • We investigate if pooling bacterial artificial chromosome (BAC) clones and sequencing the pools can provide for more accurate assembly of genome sequences than the “whole genome shotgun” (WGS) approach

  • BAC end sequences (BES) like those that would be produced as a result of Sanger sequencing were generated for each pool’s minimum tiling path, and additional BES were generated once per every 20 kbp, on average, across the reference sequence

  • With the decrease in both time and expense it is possible, as well as practical, to pool BAC clones and sequence the pools at coverages that provide for accurate assembly

Read more

Summary

Introduction

We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the “whole genome shotgun” (WGS) approach Given the reduction in time and cost offered by new sequencing technologies, it is possible to modify the MTP strategy by pooling BAC clones and “cheaply” sequencing them at sufficient coverage to provide for accurate assembly. Pooling speeds up the sequencing process and dramatically reduces cost This approach was taken by Rounsley et al [10]; they used Roche (454) Titanium next-generation sequencing reads of 6 BAC pools of ~3Mbp MTP each to assemble a 19Mbp region of the short arm of rice chromosome 3 [10]. We constructed MTPs from several plant genomes and assembled them using simulated Roche/ 454 reads with injected sequencing error

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.