Assessing the prediction fidelity of ancestral reconstruction by a library approach.

Hagit Bar-Rogovsky,Osnat Penn,Dan S Tawfik,Iris Kobl,Tal Pupko,Adi Stern

doi:10.1093/protein/gzv038

Abstract

Ancestral reconstruction is a powerful tool for studying protein evolution as well as for protein design and engineering. However, in many positions alternative predictions with relatively high marginal probabilities exist, and thus the prediction comprises an ensemble of near-ancestor sequences that relate to the historical ancestor. The ancestral phenotype should therefore be explored for the entire ensemble, rather than for the sequence comprising the most probable amino acid at all positions [the most probable ancestor (mpa)]. To this end, we constructed libraries that sample ensembles of near-ancestor sequences. Specifically, we identified positions where alternatively predicted amino acids are likely to affect the ancestor's structure and/or function. Using the serum paraoxonases (PONs) enzyme family as a test case, we constructed libraries that combinatorially sample these alternatives. We next characterized these libraries, reflecting the vertebrate and mammalian PON ancestors. We found that the mpa of vertebrate PONs represented only one out of many different enzymatic phenotypes displayed by its ensemble. The mammalian ancestral library, however, exhibited a homogeneous phenotype that was well represented by the mpa. Our library design strategy that samples near-ancestor ensembles at potentially critical positions therefore provides a systematic way of examining the robustness of inferred ancestral phenotypes.

Full Text