Abstract

Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family. The inferred duplication-loss histories provide valuable information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. While reconciliation for binary trees is a tractable and well studied problem, there are no algorithms for reconciliation with non-binary species trees. Yet a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and incomplete lineage sorting. We present the first algorithms for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Our algorithms utilize an efficient mapping from gene to species trees to infer the minimum number of duplications in O(|V(G) | x (k(S) + h(S))) time, where |V(G)| is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the size of its largest polytomy. We present a dynamic programming algorithm which also minimizes the total number of losses. Although this algorithm is exponential in the size of the largest polytomy, it performs well in practice for polytomies with outdegree of 12 or less. We also present a heuristic which estimates the minimal number of losses in polynomial time. In empirical tests, this algorithm finds an optimal loss history 99% of the time. Our algorithms have been implemented in NOTUNG, a robust, production quality, tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.

Highlights

  • RECONCILIATION IS THE PROCESS of constructing a mapping between a gene family tree and a species tree in order to infer the history of gene duplications and losses during the evolution of theThese authors contributed to this work. 1Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania. 2Department of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania

  • Since disagreements in the branching pattern of a binary gene tree and a non-binary species tree may be evidence of a duplication or incomplete lineage sorting, we need a formal basis to distinguish between required duplications—those disagreements that can only be explained by a duplication—and conditional duplications—those disagreements that can be explained by either a duplication or a deep coalescence event

  • We have presented novel algorithms for the reconciliation of binary gene trees with nonbinary species trees

Read more

Summary

INTRODUCTION

These authors contributed to this work. 1Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania. 2Department of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania. For a loss associated with a polytomy in a non-binary species tree, it is not generally possible to determine the exact lineage in the gene tree in which the loss occurred. In such cases, the loss is associated with several possible edges in EG, corresponding to alternate hypotheses regarding when the loss occurred. We discuss probabilistic approaches to reconciliation and describe directions for future work

NOTATION AND BINARY RECONCILIATION
MODELS FOR NON-BINARY SPECIES TREES
IDENTIFYING DUPLICATIONS
INFERRING GENE LOSSES
Heuristic for inferring loss histories
Inferring optimal loss histories
RELATED WORK
EMPIRICAL RESULTS
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call