Abstract

Use of 10,129 singleton SNPs of known genomic location in tetraploid cotton provided unique opportunities to characterize genome-wide diversity among 440 Gossypium hirsutum and 219 G. barbadense cultivars and landrace accessions of widespread origin. Using the SNPs distributed genome-wide, we examined genetic diversity, haplotype distribution and linkage disequilibrium patterns in the G. hirsutum and G. barbadense genomes to clarify population demographic history. Diversity and identity-by-state analyses have revealed little sharing of alleles between the two cultivated allotetraploid genomes, with a few exceptions that indicated sporadic gene flow. We found a high number of new alleles, representing increased nucleotide diversity, on chromosomes 1 and 2 in cultivated G. hirsutum as compared with low nucleotide diversity on these chromosomes in landrace G. hirsutum. In contrast, G. barbadense chromosomes showed negative Tajima’s D on several chromosomes for both cultivated and landrace types, which indicate that speciation of G. barbadense itself, might have occurred with relatively narrow genetic diversity. The presence of conserved linkage disequilibrium (LD) blocks and haplotypes between G. hirsutum and G. barbadense provides strong evidence for comparable patterns of evolution in their domestication processes. Our study illustrates the potential use of population genetic techniques to identify genomic regions for domestication.

Highlights

  • Cultivars of G. hirsutum and G. barbadense produce the overwhelming majority of the world’s cotton fiber and oil

  • This report focuses on a comparative study of linkage disequilibrium (LD) among G. hirsutum and G. barbadense chromosomes and constraints in population structure of both allopolyploids

  • A representative sample of 658 cotton accessions (440 of G. hirsutum and 218 of G. barbadense) collected from 85 countries in North America, South America, Europe, Asia, and Africa were obtained from the National Cotton Germplasm Collection (NCGC) maintained by the USDA-ARS in College Station TX34 (Table S1)

Read more

Summary

Materials and Methods

The separate TagCounts files were merged to form a “master” TagCounts file, which retained only those tags present at or above an experiment-wide minimum count. This master tag list was aligned to the TM-1 (G. hirsutum) reference genome[33] and a Tags On Physical Map (TOPM) file was generated, containing the genomic position of each tag with a unique, best alignment. The information recorded in the TOPM and TBT was used to discover SNPs at each “TagLocus” (set of tags with the same genomic position) and filter the SNPs based upon the proportion of taxa covered by the TagLocus and minor allele frequency[40]. For computing linkage disequilibrium (LD), we used expectation-maximization (EM) algorithm, formalized by[46], is a iterative technique for obtaining maximum likelihood estimates of sample haplotype frequencies

Results and Discussion
Author Contributions
Additional Information

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.