Phylogeny inference is an importance issue in computational biology. Some early approaches based on characteristics such as the maximum parsimony algorithm and the maximum likelihood algorithm will become intractable when the number of taxonomic units is large. Recent algorithms based on distance data which adopt an agglomerative scheme are widely used for phylogeny inference. However, they have to recursively merge the nearest pair of taxa and estimate a distance matrix; this may enlarge the error gradually, and lead to an inaccurate tree topology. In this study, a splitting algorithm is proposed for phylogeny inference by using the spectral graph clustering (SGC) technique. The SGC algorithm splits graphs by using the maximum cut criterion and circumvents optimization problems through solving a generalized eigenvalue system. The promising features of the proposed algorithm are the following: (i) using a heuristic strategy for constructing phylogenies from certain distance functions, which are not even additive; (ii) distance matrices do not have to be estimated recursively; (iii) inferring a more accurate tree topology than that of the Neighbor-joining (NJ) algorithm on simulated datasets; and (iv) strongly supporting hypotheses induced by other methods for Baculovirus genomes. Our numerical experiments confirm that the SGC algorithm is efficient for phylogeny inference.
Read full abstract