Abstract

BackgroundMultiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored.ResultsWe assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA.ConclusionsWe find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.

Highlights

  • Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses

  • One of the most common approaches used to identify a suitable model for phylogenetic inference is relative model selection, wherein a set of candidate models are ranked according to a given goodness-of-fit measurement, and the best-fitting model is used in the phylogenetic reconstruction [34]

  • We broadly found that there is potential for model selection, in particular on nucleotide data, to identify different best-fitting evolutionary models for different MSA versions created from the same ortholog set

Read more

Summary

Introduction

Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In addition to providing phylogenetic methods with an MSA to analyze, researchers must specify a suitable evolutionary model for the given analysis. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. While the effects of MSA uncertainty in phylogenetic pipelines have been heavily studied, the MSA is not the Spielman and Miraglia BMC Ecology and Evolution (2021) 21:214 only piece of information that is inputted to phylogenetic reconstruction and other evolutionary-informed analyses. Recent studies have suggested that relative model selection may not be a critical step in phylogenetic studies [2, 31, 33], it remains an enduring staple of most analysis pipelines. We use the phrase “model selection” to refer to relative model selection, unless otherwise stated

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.