Abstract

Likelihood probability based phylogenetic analysis approaches have contributed to impressive advances in minimizing the variance of estimating the evolutionary parameters. However, their actual applications are greatly limited due to the very time-consuming calculations of Conditional Likelihood Probabilities (CLPs). Accurately and quickly obtaining the likelihoods of massive tree samples can facilitate phylogenetic analysis process. Inspired by recent advance of machine learning techniques that greatly improve the performance of many related prediction tasks, this study proposes a Random Forest (RF) based learning and prediction approach, called NeoPLE. The approach initially learns the deep neighbor information between nodes from the topology representations of evolution trees, integrates likelihood information from these trees, and trains a non-linear prediction model. Instead of having to depend on the recursive calculations of the CLPs of tree nodes, NeoPLE transfers the process to a prediction by the trained model, thus the likelihood estimates become irrelevant with the calculations of CLPs. In terms of performance improvement, speedup factors ranging from 2.1 to 3.5X can be achieved on the analysis of realistic data sets. Moreover, NeoPLE is very suitable to handle the data sets having relatively large number of alignment sites, the factor of up to 27.5X can be achieved on the analysis of simulated data sets. In addition, NeoPLE is robust against a wide range of choices of evolutionary models and is ready to integrate in more phylogenetic inference tools. This study fills in the gaps of phylogenetic analysis using a machine learning approach in feature representation and likelihood prediction of evolution trees, which has not been reported in literatures.

Highlights

  • Evolutionary research at the molecular level addresses two major issues: rebuilding the evolutionary relationships between species and understanding the dynamics and mechanisms of evolutionary processes [1,2,3,4]

  • This study fills in the gaps of phylogenetic analysis using a machine learning approach in feature representation and likelihood prediction of evolution trees, which has not been reported in literatures

  • This study makes the full use of candidate trees and successfully establishes a model that exactly describes the relationships between the likelihood and the extracted features through exploiting the deep neighbor information of each individual tree

Read more

Summary

INTRODUCTION

Evolutionary research at the molecular level addresses two major issues: rebuilding the evolutionary relationships between species and understanding the dynamics and mechanisms of evolutionary processes [1,2,3,4]. Cheng Ling: Deep Neighbor Information Learning from Evolution Trees for Phylogenetic Likelihood Estimates. Instead of having to reply on the recursive calculations of CLPs of sampling trees, NeoPLE transfers the process to a prediction by a well-trained model, the likelihood estimates become irrelevant with the calculations of CLPs. Besides generating the same consensus trees, the speedup factors ranging from 2.1 to 3.5X can be achieved by NeoPLE on the analysis of realistic data sets comparing with MrBayes - a widespread phylogenetic. This study fills in the gaps of phylogenetic analysis using a machine learning approach in feature representation and likelihood prediction of evolution trees, which has not been reported in literatures

Overview of our methodology
Phylogenetic likelihood estimate decomposition
Feature matrix presentation
CART and Bagging
Goodness of fit
RUSULT
Experimental data sets
Performance comparisons
Consensus tree comparisons
Discussions
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call