Abstract

Measuring the dissimilarity of a phylogenetic tree with respect to a reference tree or the hypotheses is a fundamental task in the phylogenetic study. A large number of methods have been proposed to compute the distance between the reference tree and the target tree. Due to the presence of unresolved relationships among the species, it is challenging to obtain a precise and an accurate reference tree for a selected dataset. As a result, the existing tree comparison methods may behave unexpectedly in various scenarios. In this paper, we introduce a novel scoring function, called the deformity index, to quantify the dissimilarity of a tree based on the list of clades of a reference tree. The strength of our proposed method is that it depends on the list of clades that can be acquired either from the reference tree or from the hypotheses. We investigate the distributions of different modules of the deformity index and perform different goodness-of-fit tests to understand the cumulative distribution. Then, we examine, in detail, the robustness as well as the scalability of our measure by performing different statistical tests under variousmodels. Finally, we experiment on different biological datasets and show that our proposed scoring function overcomes the limitations of the conventional methods.

Highlights

  • A phylogenetic tree represents the traits of evolution between a set of species [12]

  • We propose a new semi-reference method to measure the quality of a tree using the biological knowledge of the clades

  • As this method only depends on the biological knowledge of the clades, so the Deformity Index can adapt with the present knowledge in biology and provides the quality metric in that context

Read more

Summary

Introduction

A phylogenetic tree represents the traits of evolution between a set of species [12]. There are various methods for constructing phylogenetic trees from the genotype data. These methods are broadly classified into two groups, i.e., alignment based methods and alignment free methods. The alignment based methods are further classified into four sub-categories, i.e., distance based, parsimony based, maximum likelihood based, and Bayesian methods [12, 21]. Alignment free methods are categorized into four subgroups, such as, k-mer frequency based [46], substring based [54], information theory based [15], and graphical representation based methods [1]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call