Abstract

BackgroundIn the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution.ResultsIn this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82 % of the gene structures, and a high proportion (85 %) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members.ConclusionsThe experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2490-z) contains supplementary material, which is available to authorized users.

Highlights

  • In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge

  • As described in the Methods section, DNA was prepared from the purified BAC clones and fully sequenced, and the sequence assemblies for asparagine synthetase 1 (AS1), sucrose synthase (SuSy) and xyloglucan endotransglycosylase (XET) were deposited in GenBank [GenBank: KP172187, GenBank: KP172194 and GenBank: KP172185 respectively]

  • The AS1 sequence in the BAC clone exactly matches the previously characterized maritime pine AS1 cDNA [24], and Fig. 1a shows the pattern of the BAC clone containing the AS1 gene assembled in a single scaffold of 46,111 bp

Read more

Summary

Introduction

In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Conifers exhibit unique characteristics among vascular plants, including: high genetic variability, long half-lives, seasonal survival, adaptation to secondary growth, and wood deposition among others [3] Despite their economic and ecological importance, genomic studies of conifers have been hampered by the large size of their genomes, which range from to 40 Gb, approximately 200 times the size of the Arabidopsis genome and approximately seven times the size of human genome [4]. Recent technical advances in genomic sequencing have enabled the assembly of the Norway spruce [5], white spruce [6] and loblolly pine [7] genomes, and the sequencing of a number of additional species is underway [4, 8] These assemblies represent landmark in conifer genomics, technological challenges continue to face the assembly and annotation of conifer genomes; they are characterized by a proliferation of retrotransposons, highly diverged repetitive sequences, accumulation of non-coding regions and extensive gene duplication [4, 8]. Large families of transposons and retrotransposons have been reported to occupy long stretches of the sequences in Pinus genomes [8, 9]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call