Abstract

Repetitive elements (RE) and transposons (TE) can comprise up to 80% of some plant genomes and may be essential for regulating their evolution and adaptation. The “repeatome” information is often unavailable in assembled genomes because genomic areas of repeats are challenging to assemble and are often missing from final assembly. However, raw genomic sequencing data contain rich information about RE/TEs. Here, raw genomic NGS reads of 10 gymnosperm species were studied for the content and abundance patterns of their “repeatome”. We utilized a combination of alignment on databases of repetitive elements and de novo assembly of highly repetitive sequences from genomic sequencing reads to characterize and calculate the abundance of known and putative repetitive elements in the genomes of 10 conifer plants: Pinus taeda, Pinus sylvestris, Pinus sibirica, Picea glauca, Picea abies, Abies sibirica, Larix sibirica, Juniperus communis, Taxus baccata, and Gnetum gnemon. We found that genome abundances of known and newly discovered putative repeats are specific to phylogenetically close groups of species and match biological taxa. The grouping of species based on abundances of known repeats closely matches the grouping based on abundances of newly discovered putative repeats (kChains) and matches the known taxonomic relations.

Highlights

  • Gymnosperm genomes are relatively large and variable in size, spanning from 12 Gb inLarix sibirica, 20 Gb in Picea [1,2], and up to 30 Gb in some Pinus species [3,4]

  • Aligning the genomic reads on the two databases of known repetitive elements, RepBase and PIER, we identified more than 7000 highly abundant known Repetitive elements (RE) (Table S2)

  • Based on the abundance of known repetitive elements, this Picea group is farther from Pinus than the combined Larix-Abies group. This finding does not match the known phylogeny based on chloroplast genomics, where Pinus and Picea species are closer to each other than Pinus is to Larix and Abies species [38]

Read more

Summary

Introduction

Gymnosperm genomes are relatively large and variable in size, spanning from 12 Gb inLarix sibirica, 20 Gb in Picea [1,2], and up to 30 Gb in some Pinus species [3,4]. A combination of polyploidy, high levels of repetitive DNA (RE) amplification [10], and low rates of DNA removal due to the lack of an efficient transposon suppression mechanism [2,3] can explain these differences in genome sizes. A few whole-genome sequencing data sets of conifers are currently available [1,2, 5,6,7,9,11,12] compared to more than 100 other plant genomes that have been assembled [13]. The size is not the only challenge for conifer genome study and de novo assembly. Their nucleotide substitution rates are lower than those of most angiosperms [15] and they have high synteny [11,16]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call