Abstract

Comparative genomics has revealed the ubiquity of gene and genome duplication and subsequent gene loss. In the case of gene duplication and subsequent loss, gene trees can differ from species trees, thus frequent gene duplication poses a challenge for reconstruction of species relationships. Here I address the case of multi-gene sets of putative orthologs that include some unrecognized paralogs due to ancestral gene duplication, and ask how outgroups should best be chosen to reduce the degree of non-species tree (NST) signal. Consideration of expected internal branch lengths supports several conclusions: (i) when a single outgroup is used, the degree of NST signal arising from gene duplication is either independent of outgroup choice, or is minimized by use of a maximally closely related post-duplication (MCRPD) outgroup; (ii) when two outgroups are used, NST signal is minimized by using one MCRPD outgroup, while the position of the second outgroup is of lesser importance; and (iii) when two outgroups are used, the ability to detect gene trees that are inconsistent with known aspects of the species tree is maximized by use of one MCRPD, and is either independent of the position of the second outgroup, or is maximized for a more distantly related second outgroup. Overall, these results generalize the utility of closely-related outgroups for phylogenetic analysis.

Highlights

  • Accurate phylogenetic inference is thwarted by the presence of conflicting signals in the data (e.g., [1,2,3]), a problem that has received a large amount of theoretical and experimental attention over the past few years (e.g., [4,5,6,7])

  • Loss of different members of an ancestral duplicate pair in different species can lead to a gene tree that does not reflect the species tree [11,12,13]

  • Given the importance of gene duplication in the evolution of eukaryotic genomes in general and of several lineages of great interest in particular [17,18,19], understanding such challenges is useful for correct reconstruction of evolutionary history

Read more

Summary

Introduction

Accurate phylogenetic inference is thwarted by the presence of conflicting signals in the data (e.g., [1,2,3]), a problem that has received a large amount of theoretical and experimental attention over the past few years (e.g., [4,5,6,7]). Recent studies in yeast and teleost fish [14,15,16] suggest that such reciprocal loss of gene duplicates following genome duplication may be a common phenomenon, raising the specter of significant conflicts between gene trees and species trees in such lineages. Such cases are troublesome since they are expected to pass the most common bioinformatics test for one-to-one orthologs: genes from different species that are each other’s best reciprocal BLAST hit.

Phylogenetic reconstruction in the presence of unrecognized paralogs
Author Contributions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call