Abstract

The advent of genomics has fueled optimism for improvement in the reliability and accuracy of phylogenetic trees. An implicit assumption is that there will be an inexorable improvement in phylogenetic accuracy as the number of genes used increases, and that this approach is necessary because there are no identifiable parameters that predict the phylogenetic performance of genes (Gee, 2003; Rokas et al. 2003). These issues were explored in the recent article by Rokas et al. who investigated the phylogenetic signal in a sample of 106 protein-encoding genes selected from the genomes of 8 species of yeast. Rokas et al. (2003) analyzed these genes separately, and in combination, showing that individual genes sometimes support conflicting topologies. Although considerable character incongruence existed in the combined data set, simultaneous analysis of all genes resulted in one tree with 100% bootstrap proportions (BP) at all nodes. This “species tree” was taken to represent the true phylogeny (Fig. 1a topology). The authors then carried out a series of analyses with randomly concatenated data sets of varying size to determine the minimum amount of data required to establish confidence in the species tree at a given level of statistical significance. They concluded that a minimum of 20 randomly concatenated genes was required to infer relationships confidently and that “It is only through the analyses of larger amounts of sequence data that confidence in the proposed phylogenetic reconstruction can be obtained” and further “that analyses based on a single or a small number of genes provide insufficient evidence for establishing or refuting phylogenetic relationships.” They also expressed the opinion that the result for these yeast species was likely to be typical for molecular phylogenetic studies: “. . . we believe that this group is a representative model for key issues that researchers in phylogenetics are confronting,” with the clear implication that the majority of current molecular phylogenies must be considered unreliable. Another important conclusion was that there are no predictors of phylogenetic performance of genes: “there were no identifiable parameters that could systematically account for or predict the performance of single genes.” Similarly, Gee (2003), in discussing the Rokas et al. (2003) paper states, “there are no identifiable parameters that can predict the performance of genes in any systematic way.” Finally, they noted that bootstrap values were lower and variance higher for contiguous gene sequences than for randomly sampled orthologous nucleotides and took this as evidence of the misleading signal in individual genes resulting from the nonindependence of nucleotides within genes. These conclusions, if true, are sobering for those attempting to infer relationships using DNA sequences with limited time and budgets. Herein, we demonstrate that these conclusions require substantial revision. First we show that many genes in the yeast data set published by Rokas et al. (2003) have nucleotide frequencies that have shifted markedly among taxa at third positions of codons. These nucleotide sequences deviate significantly from the stationary condition (see also Phillips et al., 2004). Second, we illustrate through a series of analyses that the stationary gene partition is superior to the nonstationary partition, recovering the underlying phylogeny with many fewer genes. Finally, we show that the conclusion of Rokas et al. regarding the superiority of random sampling of orthologous nucleotides relative to contiguous sequences for phylogenetic analysis is largely an artifact of different bootstrap treatments for these two sampling schemes. Rokas et al. (2003) used several criteria for sampling and retaining genes from seven species of Saccharomyces yeasts, and one outgroup species, Candida albicans (Fig. 1). Genes were spaced at approximately 40-kilobase intervals. Only protein-encoding genes with identifiable and generally alignable homologs in all eight

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.