Abstract

In the quest to reconstruct the Tree of Life, researchers have increasingly turned to phylogenomics, the inference of phylogenetic relationships using genome-scale data (Box 1). Mesmerized by the sustained increase in sequencing throughput, many phylogeneticists entertained the hope that the incongruence frequently observed in studies using single or a few genes [1] would come to an end with the generation of large multigene datasets. Yet, as so often happens, reality has turned out to be far more complex, as three recent large-scale analyses, one published in PLoS Biology [2]–[4], make clear. The studies, which deal with the early diversification of animals, produced highly incongruent (Box 2) findings despite the use of considerable sequence data (see Figure 1). Clearly, merely adding more sequences is not enough to resolve the inconsistencies.

Highlights

  • Taking these three studies as a case in point, we discuss pitfalls that the simple addition of sequences cannot avoid, and show how the observed incongruence can be largely overcome and how improved bioinformatics methods can help reveal the full potential of phylogenomics

  • Non-phylogenetic signal can be reduced by improving (i) the quality of primary alignments through selection of the orthologous genes that are least subject to saturation and (ii) the detection of multiple substitutions, which is best achieved by using both a large number of species and the most realistic model of sequence evolution. We show that both improvements are required at the same time to address the difficult question of the relationships among major animal groups, i.e., sponges, placozoans, ctenophores, cnidarians, and bilaterians

  • The topology we infer from the revised alignments is similar to the published tree [4], with only three nodes differing out of 21. This demonstrates that phylogenomics is relatively robust to the possible inclusion of non-orthologous sequences when the genuine phylogenetic signal is abundant, which can be explained by the randomness of most of the introduced errors preventing the appearance of a structured misleading signal

Read more

Summary

Hurdles to Phylogenomics

Two factors contribute significantly to the difficulty of reconstructing the correct phylogenetic tree for a set of sequences. Even if conflicting gene genealogies were not an issue, throwing additional gene sequences at a difficult phylogenetic question does not necessarily solve the problem—the size of the needle is increased, but so too is the size of the haystack It follows that nonphylogenetic signal may become dominant and yield incongruent, yet statistically highly supported, phylogenomic trees [12]. Non-phylogenetic signal can be reduced by improving (i) the quality of primary alignments through selection of the orthologous genes that are least subject to saturation and (ii) the detection of multiple substitutions, which is best achieved by using both a large number of species and the most realistic model of sequence evolution. Reanalysis of the underlying data indicates that failure to apply one or more of the strategies intended to decrease non-phylogenetic signal is what caused the incongruent, though strongly supported, results that were recently observed [2,3,4]

Issues at the Level of Sequence Alignments
Issues at the Level of Taxon Sampling
Issues at the Level of Tree Reconstruction Methods
Issues at the Level of Gene Sampling
Conclusion
Findings
Supporting Information
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call