Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty.

Anastasis Togkousidis,Julia Haag,Dimitri Höhler,Oleksiy M Kozlov,Alexandros Stamatakis

doi:10.1093/molbev/msad227

Abstract

Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets (53% of total MSAs), where we observe average speedups of more than 10×. Further, approximately 94% of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecular Biology and Evolution	Publication Date: Oct 4, 2023
Citations: 4	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty.

Abstract

Talk to us

Similar Papers

More From: Molecular Biology and Evolution

Lead the way for us

Similar Papers

Benchmark datasets and software for developing and testing methods for large-scale multiple sequence alignment and phylogenetic inference.
C Randal Linder ... Rahul Suri
PLoS currents | VOL. 2
C Randal Linder, et. al.C Randal Linder ... Rahul Suri
18 Nov 2010
PLoS currents | VOL. 2

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.
Ge Tan ... Manuel Gil
Systematic Biology | VOL. 64
Ge Tan, et. al.Ge Tan ... Manuel Gil
01 Jun 2015
Systematic Biology | VOL. 64

Simulated Evolutionary Optimization and Local Search: Introduction and Application to Tree Search
Atte Moilanen
Cladistics | VOL. 17
Atte MoilanenAtte Moilanen
01 Mar 2001
Cladistics | VOL. 17

Simulated Evolutionary Optimization and Local Search: Introduction and Application to Tree Search
Atte Moilanen
Cladistics | VOL. 17
Atte MoilanenAtte Moilanen
01 Mar 2001
Cladistics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty.

Abstract

Talk to us

Similar Papers

More From: Molecular Biology and Evolution