AbstractEstimating phylogenetic trees, which depict the relationships between different species, from aligned sequence data (such as DNA, RNA, or proteins) is one of the main aims of evolutionary biology. However, tree reconstruction criteria like maximum parsimony do not necessarily lead to unique trees and in some cases even fail to recognize the “correct” tree (i.e., the tree on which the data was generated). On the other hand, a recent study has shown that for an alignment containing precisely those binary characters (sites) which require up to two substitutions on a given tree, this tree will be the unique maximum parsimony tree. It is the aim of the present paper to generalize this recent result in the following sense: We show that for a tree T with n leaves, as long as $$k<\frac{n}{8}+\frac{11}{9}-\frac{1}{18}\sqrt{9\cdot \left( \frac{n}{4}\right) ^2+16}$$ k < n 8 + 11 9 - 1 18 9 · n 4 2 + 16 (or, equivalently, $$n>9k-11+\sqrt{9k^2-22k+17}$$ n > 9 k - 11 + 9 k 2 - 22 k + 17 , which in particular holds for all $$n\ge 12k$$ n ≥ 12 k ), the maximum parsimony tree for the alignment containing all binary characters which require (up to or precisely) k substitutions on T will be unique in the NNI neighborhood of T and it will coincide with T, too. In other words, within the NNI neighborhood of T, T is the unique most parsimonious tree for the said alignment. This partially answers a recently published conjecture affirmatively. Additionally, we show that for $$n\ge 8$$ n ≥ 8 and for k being in the order of $$\frac{n}{2}$$ n 2 , there is always a pair of phylogenetic trees T and $$T'$$ T ′ which are NNI neighbors, but for which the alignment of characters requiring precisely k substitutions each on T in total requires fewer substitutions on $$T'$$ T ′ .
Read full abstract