Reconstruction of ancestral protein sequences and its applications

Wei Cai,Jimin Pei,Nick V Grishin

doi:10.1186/1471-2148-4-33

Abstract

BackgroundModern-day proteins were selected during long evolutionary history as descendants of ancient life forms. In silico reconstruction of such ancestral protein sequences facilitates our understanding of evolutionary processes, protein classification and biological function. Additionally, reconstructed ancestral protein sequences could serve to fill in sequence space thus aiding remote homology inference.ResultsWe developed ANCESCON, a package for distance-based phylogenetic inference and reconstruction of ancestral protein sequences that takes into account the observed variation of evolutionary rates between positions that more precisely describes the evolution of protein families. To improve the accuracy of evolutionary distance estimation and ancestral sequence reconstruction, two approaches are proposed to estimate position-specific evolutionary rates. Comparisons show that at large evolutionary distances our method gives more accurate ancestral sequence reconstruction than PAML, PHYLIP and PAUP*. We apply the reconstructed ancestral sequences to homology inference and functional site prediction. We show that the usage of hypothetical ancestors together with the present day sequences improves profile-based sequence similarity searches; and that ancestral sequence reconstruction methods can be used to predict positions with functional specificity.ConclusionsAs a computational tool to reconstruct ancestral protein sequences from a given multiple sequence alignment, ANCESCON shows high accuracy in tests and helps detection of remote homologs and prediction of functional sites. ANCESCON is freely available for non-commercial use. Pre-compiled versions for several platforms can be downloaded from .

Highlights

Modern-day proteins were selected during long evolutionary history as descendants of ancient life forms
Alignment Based Rate Factor α and Rate Factor α estimated by Maximum Likelihood Evolutionary simulations based on a Z-score model introduce rate variation across sites in a natural way by incorporating structural and functional constraints specific for a protein family [21]
Using the paired t-test in the third testing set, we show that ANCESCON method with α estimated by Maximum Likelihood (αML) gives significantly better reconstruction than the other 3 methods

Summary

Introduction

Modern-day proteins were selected during long evolutionary history as descendants of ancient life forms. Present-day protein sequences can be used to reconstruct ancestral sequences based on a model of sequence evolution Such knowledge about ancestral sequences is helpful for understanding the evolutionary processes as well as the functional aspects of a protein family. Joint reconstruction methods intend to find the most likely set of amino acids for all internal nodes at a site, which yields the maximum joint likelihood of the tree [5]. Marginal reconstruction compares the probabilities of different amino acids at an internal node at a site and selects the amino acid that yields the maximum likelihood for the tree at that site. The computational complexities for both algorithms scale linearly with the number of sequences Both marginal and joint reconstruction algorithms are implemented in our program

Objectives

Results

Conclusion