Abstract

BackgroundIn today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda.ResultsWe present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome.ConclusionsThis study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal.

Highlights

  • In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome

  • The GC content of the bacterial artificial chromosome (BAC) is similar to the shotgun reads, and the GC content of the P. taeda genome falls within typical ranges of angiosperm species (Table 1)

  • Analyses of the structure and content of ten P. taeda BACs suggest that pseudogenes may be common in the pine genome and that isolated repetitive elements such as LTR retrotransposons can be discerned from a background of fragmented fossil repeats of unknown origin

Read more

Summary

Introduction

In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. With 110-120 species spread throughout the Northern Hemisphere, the pines (Pinus) comprise the largest genus in the coniferous family Pinaceae of the gymnosperms. Pine chromosomes (2N = 24) are uniform in both number (2N = 24) and appearance, for they lack major distinguishing physical features [10] They are so large and dense that standard karyotyping probes such as rDNA, GC-rich and telomeric repeat sequences failed to produce differential banding patterns among the chromosomes and have complicated karyotyping efforts [11,12,13,14,15,16,17,18]. P. taeda is ideal for additional genomic exploration among pines, conifers and gymnosperms

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call