Whole-Genome Restriction Mapping by "Subhaploid"-Based RAD Sequencing: An Efficient and Flexible Approach for Physical Mapping and Genome Scaffolding.

Jinzhuang Dou,Jia Wang,Zhenmin Bao,Yangping Li,Tianqi Li,Lingling Zhang,Shi Wang,Xiaoli Hu,Yuli Li,Chuang Mu,Huaiqian Dou

doi:10.1534/genetics.117.200303

Abstract

Assembly of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies. Generation of ultradense linkage maps is promising for anchoring such assemblies, but traditional linkage mapping methods are hindered by the infrequency and unevenness of meiotic recombination that limit attainable map resolution. Here we develop a sequencing-based “in vitro” linkage mapping approach (called RadMap), where chromosome breakage and segregation are realized by generating hundreds of “subhaploid” fosmid/bacterial-artificial-chromosome clone pools, and by restriction site-associated DNA sequencing of these clone pools to produce an ultradense whole-genome restriction map to facilitate genome scaffolding. A bootstrap-based minimum spanning tree algorithm is developed for grouping and ordering of genome-wide markers and is implemented in a user-friendly, integrated software package (AMMO). We perform extensive analyses to validate the power and accuracy of our approach in the model plant Arabidopsis thaliana and human. We also demonstrate the utility of RadMap for enhancing the contiguity of a variety of whole-genome shotgun assemblies generated using either short Illumina reads (300 bp) or long PacBio reads (6–14 kb), with up to 15-fold improvement of N50 (∼816 kb-3.7 Mb) and high scaffolding accuracy (98.1–98.5%). RadMap outperforms BioNano and Hi-C when input assembly is highly fragmented (contig N50 = 54 kb). RadMap can capture wide-range contiguity information and provide an efficient and flexible tool for high-resolution physical mapping and scaffolding of highly fragmented assemblies.

Highlights

Of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies
A mapping panel is created by generating hundreds of large-insert fosmid/BAC clone pools, with each pool covering less than one haploid genome (e.g., 0.3– 0.73; Figure 1A)
To determine an optimal marker order, we develop a based minimum spanning tree (bMST) algorithm for the grouping and ordering of genome-wide markers (Figure 1C; see Materials and Methods for algorithm details), which is well suited for dealing with noisy or incomplete mapping data

Summary

Introduction

Of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies. Physical maps are indispensable tools in early eukaryotic genome projects where they provide an essential framework for ordering and joining sequence data, genetically mapped markers, and large-insert clones, and can be used alone to isolate genes of interest, to home in on particular regions for sequencing, or to compare the organizations of different species’ genomes (Meyers et al 2004; Lewin et al 2009; van Oeveren et al 2011) Despite these advantages, traditional physical mapping approaches have been, less favorable in the NGS era, because the creation and profiling of bacterial-artificial-chromosome (BAC) libraries remains labor intensive, time consuming, and expensive; e.g., physical mapping based on a 103 human BAC library would have to deal with 200,000 BAC clones. Because each aliquot in a HAPPY panel contains very little genomic DNA (e.g., ,3 pg for human DNA; Dear 2005), the lack of faithful amplification of each aliquot DNA to provide enough material for genotyping a large number of markers has been the bottleneck of this method (Jiang et al 2009), and this has prevented such a simple and powerful method from coming into general use since it was invented 20 years ago

Methods

Results

Discussion

Conclusion