Abstract

The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]

Highlights

  • The ubiquity of rDNA, essential for protein synthesis, makes it a key target for evolutionary studies. rDNA sequences and subsequences are commonly used for determining species identity and inferring genetic interrelationship (Woese 2000)

  • The SNP and pSNP polymorphisms identified in the rDNA arrays of each of the 26 S. paradoxus strains plus S. cerevisiae strain S288c, with the rDNA consensus sequence of S. paradoxus strain CBS432 as a reference, were closely examined

  • The resulting rDNA-based phylogenetic tree (Fig. 2a), estimated from the pSNP/SNP allele frequency matrix mirrored the pattern observed in Figure 1, splitting into three well-supported groups that directly corresponded to geographical origins

Read more

Summary

Introduction

The ubiquity of rDNA, essential for protein synthesis, makes it a key target for evolutionary studies. rDNA sequences and subsequences are commonly used for determining species identity and inferring genetic interrelationship (Woese 2000). Several potential pitfalls in the use of rDNA for phylogenetic inference have been noted (Álvarez and Wendel 2003). These issues include difficulty in resolving paralogous from orthologous sequences (in cases of multilocus rDNA systems), incomplete intragenomic sequence homogeneity, the presence of rDNA pseudogenes, secondary structure considerations, difficulties in sequence alignment, frequent ITS sequence contamination, and homoplasy (Álvarez and Wendel 2003). Sequence heterogeneity within the rDNA unit has long been a problem in phylogenetic analysis of many species groups, with numerous studies citing this issue, in particular within the ITS region (Buckler et al.1997; Álvarez and Wendel 2003; Nilsson et al 2008; Kiss 2012). We discuss the implications of our study for the phylogenetic analysis of multilocus rDNA systems

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call