Abstract

Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and to facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable thus far. Using two-color genome mapping of tiling bacterial artificial chromosomes (BAC) clones on nanochannel arrays, we completed high-confidence assembly of a 2.1-Mb, highly repetitive region in the large and complex genome of Aegilops tauschii, the D-genome donor of hexaploid wheat (Triticum aestivum). Genome mapping is based on direct visualization of sequence motifs on single DNA molecules hundreds of kilobases in length. With the genome map as a scaffold, we anchored unplaced sequence contigs, validated the initial draft assembly, and resolved instances of misassembly, some involving contigs <2 kb long, to dramatically improve the assembly from 75% to 95% complete.

Highlights

  • Accurate de novo assembly of sequence reads represents the weak link in genome projects despite advances in high-throughput sequencing [1,2]

  • We constructed a genome map using two nicking enzymes, Nt.BbvCI and Nt.BspQI, whose nick motifs were labeled with red and green dyes, respectively, across 27 bacterial artificial chromosomes (BAC) making up an minimal tiling path (MTP) of a 2.1-Mb region containing the prolamin multigene family in the Ae. tauschii genome

  • Individual BAC molecules with red and green labels distributed at sequencespecific locations were compared and clustered into a pools with similar map patterns (Figure 1C, top)

Read more

Summary

Introduction

Accurate de novo assembly of sequence reads represents the weak link in genome projects despite advances in high-throughput sequencing [1,2]. There are two general steps in genome sequence assembly: generation of sequence contigs and scaffolds, and their anchoring on genome-wide, lower resolution maps. NGS reads are often too short for unambiguous assembly. To order contigs and scaffolds, high-resolution genomic maps from an independent technology platform are needed. They may be of chromosomal scale, i.e., genetic maps, or regional scale, i.e., contigs of bacterial artificial chromosomes (BACs) or fosmids [4]. Contigs and scaffolds may be difficult to map if they are too short compared to the map resolution. Typical medium to large genomes contain 40– 85% repetitive sequences [5,6,7,8], dramatically hindering effective de novo sequence assembly

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call