IntroductionRibosomal DNA (rDNA) loci have been widely used for identification of allopolyploids and hybrids, although few of these studies employed high-throughput sequencing data. Here we use graph clustering implemented in the RepeatExplorer (RE) pipeline to analyze homoeologous 5S rDNA arrays at the genomic level searching for hybridogenic origin of species. Data were obtained from more than 80 plant species, including several well-defined allopolyploids and homoploid hybrids of different evolutionary ages and from widely dispersed taxonomic groups.Results(i) Diploids show simple circular-shaped graphs of their 5S rDNA clusters. In contrast, most allopolyploids and other interspecific hybrids exhibit more complex graphs composed of two or more interconnected loops representing intergenic spacers (IGS). (ii) There was a relationship between graph complexity and locus numbers. (iii) The sequences and lengths of the 5S rDNA units reconstituted in silico from k-mers were congruent with those experimentally determined. (iv) Three-genomic comparative cluster analysis of reads from allopolyploids and progenitor diploids allowed identification of homoeologous 5S rRNA gene families even in relatively ancient (c. 1 Myr) Gossypium and Brachypodium allopolyploids which already exhibit uniparental partial loss of rDNA repeats. (v) Finally, species harboring introgressed genomes exhibit exceptionally complex graph structures.ConclusionWe found that the cluster graph shapes and graph parameters (k-mer coverage scores and connected component index) well-reflect the organization and intragenomic homogeneity of 5S rDNA repeats. We propose that the analysis of 5S rDNA cluster graphs computed by the RE pipeline together with the cytogenetic analysis might be a reliable approach for the determination of the hybrid or allopolyploid plant species parentage and may also be useful for detecting historical introgression events.
Read full abstract