Abstract

Like those of many agricultural crops, the cultivated cotton is an allotetraploid and has a large genome (~2.5 gigabase pairs). The two sub genomes, A and D, are highly similar but unequally sized and repeat-rich, which pose significant challenges for accurate genome reconstruction using standard approaches. Here we report the development of BAC libraries, sub genome specific physical maps, and a new-generation sequencing approach that will lead to a reference-grade genome assembly for Upland cotton. Three BAC libraries were constructed, fingerprinted, and integrated with BAC-end sequences (BES) to produce a de novo whole-genome physical map. The BAC map was partitioned by sub genomes through alignment to the diploid progenitor D-genome reference sequence with densely spaced BES anchor points and computational filtering. The physical maps were validated with FISH and genetic mapping of SNP markers derived from BES. Two pairs of homeologous chromosomes, A11/D11 and A12/D12, were used to assess multiplex sequencing approaches for completeness and scalability. The results represent the first sub genome anchored physical maps of Upland cotton, and a new-generation approach to the whole-genome sequencing, which will lead to the reference-grade assembly of allopolyploid cotton and serve as a general strategy for sequencing other polyploid species.

Highlights

  • Whereas the D-genome fiber is rudimentary and not useful[7]

  • BAC-based whole-genome physical mapping and hierarchal BAC-by-BAC sequencing techniques have served as the portal approach to reference grade genome assemblies for complex plant species, such as Arabiodopsis[25], rice[26], maize[27], and peach[28]

  • Two BAC libraries are composed of clones with inserts derived from partial restriction digestion (Gh_TBh and GH_TBb), and the third is derived of inserts resulting from mechanical genome fractionation (Gh_TBr) (Lucigen, Madison, Wisconsin)

Read more

Summary

Introduction

Whereas the D-genome fiber is rudimentary and not useful[7]. The fiber in allotetraploids is much longer and stronger, suggesting activation and/or silencing of homeologous fiber-related genes by genetic and epigenetic mechanisms[8,9,10]. Statistics for both of the available G. hirsutum draft assemblies suggests a high degree of incompleteness and lack of contiguity (scaffold N50 = 1,600 Kb)[21], (N50 = 764 Kb)[20], implying that the existing strategies to sequencing allopolyploid genomes cannot fully resolve heterozygous, paralogous and homeologous genes and repetitive DNA elements Because of these challenges, generalized progressions in the development of trait genetics and tools for understanding polyploid species has relied on exploiting the extant progenitor or progenitor-related species as a precursor, and utilize that data to make inferences and insights toward the polyploid. Sequencing approaches using different-size pools of minimum tiling path (MTP) BACs have provided useful insights into future sequencing and assembling of the complete allotetraploid cotton genome, which will produce a reference-grade genome sequence for cotton and other polyploid species

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call