Whole Genome Duplications (WGDs) are events that double the content and structure of a genome. In some organisms, multiple WGD events have been observed while loss of genetic material is a typical occurrence following a WGD event. The requirement of classic rearrangement models that every genetic marker has to occur exactly two times in a given problem instance, therefore, poses a serious restriction in this context. The Double-Cut and Join (DCJ) model is a simple and powerful model for the analysis of large structural rearrangements. After being extended to the DCJ-Indel model, capable of handling gains and losses of genetic material, research has shifted in recent years toward enabling it to handle natural genomes, for which no assumption about the distribution of markers has to be made. The traditional theoretical framework for studying WGD events is the Genome Halving Problem (GHP). While the GHP is solved for the DCJ model for genomes without losses, there are currently no exact algorithms utilizing the DCJ-Indel model that are able to handle natural genomes. In this work, we present a general view on the DCJ-Indel model that we apply to derive an exact polynomial time and space solution for the GHP on genomes with at most two genes per family before generalizing the problem to an integer linear program solution for natural genomes.
Read full abstract