A mixed integer linear programming model to reconstruct phylogenies from single nucleotide polymorphism haplotypes under the maximum parsimony criterion

Daniele Catanzaro,Russell Schwartz,Ramamoorthi Ravi

doi:10.1186/1748-7188-8-3

Daniele Catanzaro, Russell Schwartz + Show 1 more

Open Access

https://doi.org/10.1186/1748-7188-8-3

Copy DOI

Abstract

BackgroundPhylogeny estimation from aligned haplotype sequences has attracted more and more attention in the recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from medical research, to drug discovery, to epidemiology, to population dynamics. The literature on molecular phylogenetics proposes a number of criteria for selecting a phylogeny from among plausible alternatives. Usually, such criteria can be expressed by means of objective functions, and the phylogenies that optimize them are referred to as optimal. One of the most important estimation criteria is the parsimony which states that the optimal phylogeny T∗for a set of n haplotype sequences over a common set of variable loci is the one that satisfies the following requirements: (i) it has the shortest length and (ii) it is such that, for each pair of distinct haplotypes , the sum of the edge weights belonging to the path from hi to hj in T∗ is not smaller than the observed number of changes between hi and hj. Finding the most parsimonious phylogeny for involves solving an optimization problem, called the Most Parsimonious Phylogeny Estimation Problem (MPPEP), which is -hard in many of its versions.ResultsIn this article we investigate a recent version of the MPPEP that arises when input data consist of single nucleotide polymorphism haplotypes extracted from a population of individuals on a common genomic region. Specifically, we explore the prospects for improving on the implicit enumeration strategy of implicit enumeration strategy used in previous work using a novel problem formulation and a series of strengthening valid inequalities and preliminary symmetry breaking constraints to more precisely bound the solution space and accelerate implicit enumeration of possible optimal phylogenies. We present the basic formulation and then introduce a series of provable valid constraints to reduce the solution space. We then prove that these constraints can often lead to significant reductions in the gap between the optimal solution and its non-integral linear programming bound relative to the prior art as well as often substantially faster processing of moderately hard problem instances.ConclusionWe provide an indication of the conditions under which such an optimal enumeration approach is likely to be feasible, suggesting that these strategies are usable for relatively large numbers of taxa, although with stricter limits on numbers of variable sites. The work thus provides methodology suitable for provably optimal solution of some harder instances that resist all prior approaches.

Highlights

Phylogeny estimation from aligned haplotype sequences has attracted more and more attention in the recent years due to its importance in analysis of many fine-scale genetic data
Characterizing evolutionary relationships between organisms and their genomes is the basis of comparative genomic methods for interpreting meaning in sequence data, and for this reason the use of molecular phylogenetics has become widely used in a multitude of research fields such as systematics, medical research, drug discovery, epidemiology, and population dynamics [3]
We show that it is possible to exploit the high symmetry inherent in most instances of the problem, through a series of strengthening valid inequalities, to eliminate redundant solutions and reduce the practical search space

Summary

Introduction

Phylogeny estimation from aligned haplotype sequences has attracted more and more attention in the recent years due to its importance in analysis of many fine-scale genetic data. The literature on molecular phylogenetics proposes a number of criteria for selecting a phylogeny from among plausible alternatives Such criteria can be expressed by means of objective functions, and the phylogenies that optimize them are referred to as optimal. Molecular phylogenetics studies the hierarchical evolutionary relationships among species, or taxa, by means of molecular data such as DNA, RNA, amino acid or codon sequences These relationships are usually described through a weighted tree, called a phylogeny, whose leaves represent the observed taxa, internal vertices represent the intermediate ancestors, edges represent the estimated evolutionary relationships, and edge weights represent measures of the similarity between pairs of taxa. The criteria can usually be quantified and expressed in terms of objective functions, giving rise to families of optimization problems whose general paradigm can be stated as follows [11]: Problem 1. – The Phylogeny Estimation Problem (PEP)

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: Jan 23, 2013
Citations: 43	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A mixed integer linear programming model to reconstruct phylogenies from single nucleotide polymorphism haplotypes under the maximum parsimony criterion

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

Allele‐specific PCR typing and sequencing of the mitochondrial D‐loop region in four layer breeds
Takashi Harumi ... Takeo Minematsu
Animal Science Journal | VOL. 82
Takashi Harumi, et. al.Takashi Harumi ... Takeo Minematsu
02 Mar 2011
Animal Science Journal | VOL. 82

A Multi-Exonic BRCA1 Deletion Identified in Multiple Families through Single Nucleotide Polymorphism Haplotype Pair Analysis and Gene Amplification with Widely Dispersed Primer Sets
Benjamin D Ward ... Thomas Scholl
The Journal of Molecular Diagnostics | VOL. 7
Benjamin D Ward, et. al.Benjamin D Ward ... Thomas Scholl
01 Feb 2005
The Journal of Molecular Diagnostics | VOL. 7

Modeling and optimization of the hybrid flow shop scheduling problem with sequence-dependent setup times
Huiting Xue ... Biao Zhang
International Journal of Industrial Engineering Computations | VOL. 15
Huiting Xue, et. al.Huiting Xue ... Biao Zhang
01 Jan 2024
International Journal of Industrial Engineering Computations | VOL. 15

HLA and SNP haplotype mapping in the Japanese population
H Kitajima ... K Yamamoto
Genes & Immunity | VOL. 13
H Kitajima, et. al.H Kitajima ... K Yamamoto
23 Aug 2012
Genes & Immunity | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A mixed integer linear programming model to reconstruct phylogenies from single nucleotide polymorphism haplotypes under the maximum parsimony criterion

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology