Abstract

BackgroundThe size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.ResultsWe develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.ConclusionsIn addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.

Highlights

  • The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly

  • From the first whole genome shotgun (WGS) assembly of the 1.8 million base pair Haemophilus influenzae genome in 1995 to the orders-of-magnitude larger three-billion-base-pair mammalian genomes that followed years later [12], the WGS protocol has been an efficient and effective method of producing high quality reference genomes. This was in part made possible by the overlap layout consensus (OLC) assembly paradigm championed by Myers [13] and ubiquitously implemented in first-generation WGS assemblers

  • When next-generation sequencing disruptively ushered in a new era of WGS sequencing, the extremely large numbers of reads exceeded the capabilities of existing OLC assemblers

Read more

Summary

Introduction

The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. Conifers will likely provide many genome-level insights on the origins of genetic diversity in higher plants. Over 1.5 billion seedlings are planted annually, approximately 80% of which are genetically improved, driving its selection as the reference conifer genome. Its genetic resources are unsurpassed in that three tree improvement cooperatives have been breeding loblolly pine for more than 60 years and manage millions of trees in genetic trials. Current research focuses on the potential of genomic selection for continued genetic improvement [5]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.