Abstract

Although new and emerging next-generation sequencing (NGS) technologies have reduced sequencing costs significantly, much work remains to implement them for de novo sequencing of complex and highly repetitive genomes such as the tetraploid genome of Upland cotton (Gossypium hirsutum L.). Herein we report the results from implementing a novel, hybrid Sanger/454-based BAC-pool sequencing strategy using minimum tiling path (MTP) BACs from Ctg-3301 and Ctg-465, two large genomic segments in A12 and D12 homoeologous chromosomes (Ctg). To enable generation of longer contig sequences in assembly, we implemented a hybrid assembly method to process ~35x data from 454 technology and 2.8-3x data from Sanger method. Hybrid assemblies offered higher sequence coverage and better sequence assemblies. Homology studies revealed the presence of retrotransposon regions like Copia and Gypsy elements in these contigs and also helped in identifying new genomic SSRs. Unigenes were anchored to the sequences in Ctg-3301 and Ctg-465 to support the physical map. Gene density, gene structure and protein sequence information derived from protein prediction programs were used to obtain the functional annotation of these genes. Comparative analysis of both contigs with Arabidopsis genome exhibited synteny and microcollinearity with a conserved gene order in both genomes. This study provides insight about use of MTP-based BAC-pool sequencing approach for sequencing complex polyploid genomes with limited constraints in generating better sequence assemblies to build reference scaffold sequences. Combining the utilities of MTP-based BAC-pool sequencing with current longer and short read NGS technologies in multiplexed format would provide a new direction to cost-effectively and precisely sequence complex plant genomes.

Highlights

  • Cotton is one of the most important fiber and oil seed crops and it contributes ~ $500 billion/yr. to world’s economy [1]

  • Ctg-3301 was composed with Pool-4 and Pool-5 with 4 and 3 Bacterial Artificial Chromosome (BAC) clones respectively; while Ctg-465 comprised of Pool-1, Pool-2 and Pool-3 each with 4 BAC clones

  • To overcome the challenges of generating high quality genome sequence data in Upland cotton, we proposed a novel minimum tiling path (MTP)-based BAC-pool sequencing method which assists in accurate association of sequence data to highly homoeologous chromosomes to generate a high quality draft tetraploid cotton genome sequence

Read more

Summary

Introduction

Cotton is one of the most important fiber and oil seed crops and it contributes ~ $500 billion/yr. to world’s economy [1]. Cotton is one of the most important fiber and oil seed crops and it contributes ~ $500 billion/yr. To world’s economy [1]. The ‘Gossypium’ genus consists of nearly 50 different cotton species including five allotetraploids (AtDt) and other diploids. Cotton fiber has been studied widely to understand the cell elongation and cellulose synthesis. The need to understand the genome organization, complexity and evolution of cotton has provided an impetus to sequencing efforts, which have gained momentum in recent years. Decoding cotton genomes continues to be a quest for understanding of the functional and agronomic significance of ploidy and genome size variation within the Gossypium genus [1]. Choosing a costeffective sequencing strategy that can deliver informative whole genome sequence is of major concern for many polyploid species

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call