Abstract

kSNP v2 is a powerful tool for single nucleotide polymorphism (SNP) identification from complete microbial genomes and for estimating phylogenetic trees from the identified SNPs. kSNP can analyse finished genomes, genome assemblies, raw reads or any combination of those and does not require either genome alignment or reference genomes. This study uses sequence evolution simulations to evaluate the topological accuracy of kSNP trees and to assess the effects of diversity and recombination on that accuracy. The accuracies of kSNP trees are strongly affected by increasing diversity, with parsimony accuracy>maximum-likelihood accuracy>neighbour-joining accuracy. Accuracy is also strongly influenced by recombination; as recombination increases accuracy decreases. Reliable trees are arbitrarily defined as those that have ≥90% topological accuracy. It is determined that the best predictor of topological accuracy is the ratio of r/m, a measure of the effect of recombination, to FCK (the fraction of core kmers), a measure of diversity. Tools are available to allow investigators to determine both r/m and FCK, and the relationship between topological accuracy and the ratio of r/m to FCK is determined. The practical implication of this study is that kSNP is an effective tool for estimating phylogenetic trees from microbial genome sequences provided that both recombination and sequence diversity are within acceptable ranges.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call