Maximum Parsimony Problem Research Articles

BackgroundThe inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred.ResultsFor large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, we introduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrate its performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement.ConclusionsOur results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability and optimality to be a significant improvement in the real-world phylogenetic analysis of sequence data.

Read full abstract

Phylogenies-the evolutionary histories of groups of organisms-play a major role in representing the interrelationships among biological entities. Many methods for reconstructing and studying such phylogenies have been proposed, almost all of which assume that the underlying history of a given set of species can be represented by a binary tree. Although many biological processes can be effectively modeled and summarized in this fashion, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in networks of relationships rather than trees of relationships. In previous works, we formulated a maximum parsimony (MP) criterion for reconstructing and evaluating phylogenetic networks, and demonstrated its quality on biological as well as synthetic data sets. In this paper, we provide further theoretical results as well as a very fast heuristic algorithm for the MP criterion of phylogenetic networks. In particular, we provide a novel combinatorial definition of phylogenetic networks in terms of "forbidden cycles," and provide detailed hardness and hardness of approximation proofs for the "small" MP problem. We demonstrate the performance of our heuristic in terms of time and accuracy on both biological and synthetic data sets. Finally, we explain the difference between our model and a similar one formulated by Nguyen et al., and describe the implications of this difference on the hardness and approximation results.

Read full abstract

Maximum Parsimony Problem Research Articles

Related Topics

Articles published on Maximum Parsimony Problem

The tree alignment problem.

Parsimony Score of Phylogenetic Networks: Hardness Results and a Linear-Time Heuristic

A Subdivision Approach to Maximum Parsimony

Progressive Tree Neighborhood applied to the Maximum Parsimony Problem

Mapping Edge Sets to Splits in Trees: the Path Index and Parsimony

PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction

Landscapes on spaces of trees

Proof of the populous path algorithm for missing mutations in parsimony trees

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Maximum Parsimony Problem Research Articles

Related Topics

Articles published on Maximum Parsimony Problem

The tree alignment problem.

Parsimony Score of Phylogenetic Networks: Hardness Results and a Linear-Time Heuristic

A Subdivision Approach to Maximum Parsimony

Progressive Tree Neighborhood applied to the Maximum Parsimony Problem

Mapping Edge Sets to Splits in Trees: the Path Index and Parsimony

PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction

Landscapes on spaces of trees

Proof of the populous path algorithm for missing mutations in parsimony trees