Abstract

Accurate reconstruction of ancestral states is a critical evolutionary analysis when studying ancient proteins and comparing biochemical properties between parental or extinct species and their extant relatives. It relies on multiple sequence alignment (MSA) which may introduce biases, and it remains unknown how MSA methodological approaches impact ancestral sequence reconstruction (ASR). Here, we investigate how MSA methodology modulates ASR using a simulation study of various evolutionary scenarios. We evaluate the accuracy of ancestral protein sequence reconstruction for simulated data and compare reconstruction outcomes using different alignment methods. Our results reveal biases introduced not only by aligner algorithms and assumptions, but also tree topology and the rate of insertions and deletions. Under many conditions we find no substantial differences between the MSAs. However, increasing the difficulty for the aligners can significantly impact ASR. The MAFFT consistency aligners and PRANK variants exhibit the best performance, whereas FSA displays limited performance. We also discover a bias towards reconstructed sequences longer than the true ancestors, deriving from a preference for inferring insertions, in almost all MSA methodological approaches. In addition, we find measures of MSA quality generally correlate highly with reconstruction accuracy. Thus, we show MSA methodological differences can affect the quality of reconstructions and propose MSA methods should be selected with care to accurately determine ancestral states with confidence.

Highlights

  • Given an ensemble of known sequences, ancestral sequence reconstruction (ASR) refers to methods used to recover the genetic sequence character states of their common ancestors

  • We tested the impact of multiple sequence alignment (MSA) tools on ancestral state reconstruction accuracy using amino acid sequences simulated under various realistic conditions

  • We found undemanding conditions result in effectively no differences between any alignment methods, with reconstruction accuracy as good as using the true alignment, frequently permitting near-perfect ASR

Read more

Summary

Introduction

Given an ensemble of known sequences, ancestral sequence reconstruction (ASR) refers to methods used to recover the genetic sequence character states of their common ancestors. It has been used to study molecular evolution of photoreactive proteins (Chang et al 2002; Shi and Yokoyama 2003; Ugalde et al 2004; Yokoyama and Takenaka 2004; Chinen et al 2005; Yokoyama et al 2008; Bickelmann et al 2015), thermal stability of ancient proteins (Gaucher et al 2003; Shimizu et al 2007; Gaucher et al 2008; Gouy and Chaussidon 2008; Akanuma et al 2011; Perez-Jimenez et al 2011; Akanuma et al 2015; Busch et al 2016), and evolution of viral proteins (Kaiser et al 2007; Gullberg et al 2010; Zinn et al 2015) Extensive reviews of these topics are found in Liberles (2007), Ogawa and Shirai (2013), and Merkl and Sterner (2016). Reconstruction quality is likely to depend on the age of the ancestors, the number of observed descendants and the use of sufficiently realistic evolutionary models

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call