Abstract

The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55–83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

Highlights

  • The discrepancy between the amount of DNA in a non-replicated haploid nucleus (C-value) and the complexity of eukaryotic organisms, known as the C-value paradox [1], has long puzzled geneticists and evolutionary biologists

  • The graph-based read clustering has become the core algorithm for RepeatExplorer, a computational pipeline designed for identification, quantification and annotation of repeats in plant genomes [32]. We have extended this bioinformatic approach by introducing several novel methods for repeat characterization and applied them to analyse the genomes of 23 species belonging to the legume tribe Fabeae

  • We have developed a novel bioinformatic approach to estimate this proportion by quantifying the number of reads containing the junction between the 3' end of the LTR (LTR_3'end) and the internal retrotransposon region (i.e. 5' UTR, starting with the primer binding site) and the reads containing just the LTR_3'end alone, and representing an insertion site of the element

Read more

Summary

Introduction

The discrepancy between the amount of DNA in a non-replicated haploid nucleus (C-value) and the complexity of eukaryotic organisms, known as the C-value paradox [1], has long puzzled geneticists and evolutionary biologists. Multiple lines of research, starting with the pioneering works employing DNA reassociation kinetics [2,3] and culminating in the recent application of high throughput genome sequencing technologies have provided evidence that genome size variation is primarily driven by the differential accumulation and elimination of repetitive DNA, whereas the number of genes remains relatively stable [4] These findings have led to the proposal of an alternative term, the C-value enigma, reflecting the fact that there is no paradox in the causes of the observed genome size variation, there is still relatively little known about how the various molecular and evolutionary mechanisms contribute to genome size diversification in different groups of organisms [5,6]. There is a need for a thorough characterization of repeats at various scales, from individuals and species to higher taxa, in order to test the validity of the proposed hypotheses or to develop new ones [10]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call