Abstract

Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences.

Highlights

  • Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution

  • Our benchmarking exercise focused on Bayesian versus maximum parsimony (MP) algorithms, the effect of rate variation when modelled as a discrete gamma distribution[14], subsamples of taxa to infer ancestral sequences, and species-tree-aware versus unaware approaches within the Bayesian framework[15,16]

  • We have applied brute-force random-mutagenesis and guided-selection to generate an experimental phylogeny of synthetic fluorescent protein (FP) to recapitulate evolutionary processes that govern natural FPs17

Read more

Summary

Introduction

Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. Ancestral sequence reconstruction (ASR) is the process of analyzing modern sequences within an evolutionary/ phylogenetic context to infer the ancestral sequences at particular nodes of a tree[1] These ancient sequences are most often synthesized, recombinantly expressed in laboratory microorganisms or cell lines, and characterized to reveal the ancient properties of the extinct biomolecules[2,3,4,5,6]. Genetic material is not preserved in fossils on a long enough time scale to satisfy most ASR studies (many millions to billions of years ago), and it is not yet physically possible to travel back in time to collect samples To overcome these limitations, we exploited an under-utilized yet effective procedure to develop a phylogeny in the laboratory[8]. We demonstrate that these incorrectly inferred residues can influence the protein phenotypes of the encoded ancestral sequences and that various parameters incorporated into evolutionary models affect these incorrectly inferred sites

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call