Abstract

Unveiling ancient whole-genome duplications, or WGDs, in the evolutionary history of species is elementary to understand how gene families have formed over time and genomes evolved. A classic framework of WGD models for deciphering ancient species in which genome duplications occurred is based on reconciling multiple gene trees with a species tree. Reconciling gene trees with a species tree reveals evolutionary scenarios describing how genes have evolved along species tree branches through speciation and single duplication events. Clustering single duplication events from different gene trees occurring in the same species can reveal duplication episodes indicative of remnants of ancient WGDs. WGD models can be categorized into restricted and unrestricted models. Restricted models only consider scenarios where single duplications are limited by the timing of their ancestor speciation, while unrestricted models consider all possible evolutionary scenarios. Representing two extremes of the overall spectrum of possible scenarios, unconstrained models are biased towards locating duplication episodes close to the root of the species tree, while the constrained models tend to locate episodes close to the most recent species that theoretically could have contained them. Adding flexibility for improved biological realism, in this work, we develop and analyze a novel framework of WGD models encompassing the whole range of intermediate locations by defining, implementing, and testing models under multiple constraint strategies. We achieve this by formulating the first ILP model for the NP-hard problem of computing duplication episodes under the classic unrestricted WGD model from Fellows et al. and then incorporating constraints into this formulation reflecting WGD models for intermediate locations. Finally, we demonstrate the exemplary performance of our models and that our ILP formulations allow computing typical problem instances occurring in practice.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call