Abstract
Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%–30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary time scales, emerging as a consequence of the physical properties of the protein sequence space. Although a number of studies have determined key signatures of protein family organization, the sequence-structure factors that differentiate the two evolution-related protein families remain unknown. Here, we stipulate that subtle structural changes, which appear due to accumulating mutations in the homologous families, lead to distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs. We propose that such differentiation results in the formation of analogous families. To test our postulate, we developed a molecular modeling and design toolkit, Medusa, to computationally design protein sequences that correspond to the same fold family. We find that analogous proteins emerge when a backbone structure deviates only 1–2 Å root-mean-square deviation from the original structure. For close homologs, core residues are highly conserved. However, when the overall sequence similarity drops to ~25%–30%, the composition of core residues starts to diverge, thereby forming novel families of protein homologs. This direct observation of the formation of protein homologs within a specific fold family supports our hypothesis. The conservation of amino acids in designed sequences recapitulates that of the naturally occurring sequences, thereby validating our computational design methodology.
Highlights
Understanding the evolution of proteins is an intriguing but challenging problem in molecular biology [1,2,3,4,5,6,7,8,9,10,11,12], which in many regards is vital to the progress in the field
A rotamer library contains a discrete set of conformations for each amino acid and is developed to best represent common conformations observed in the protein databank (PDB)
Inspired by recent success in computational protein design [21,24,25,26,29], we developed a protein evolution model combining large-scale structural sampling and protein sequence redesign with backbone relaxation
Summary
Understanding the evolution of proteins is an intriguing but challenging problem in molecular biology [1,2,3,4,5,6,7,8,9,10,11,12], which in many regards is vital to the progress in the field. One of the puzzling observations about the zoo of known protein structures, which emerge as the direct result of evolution, is the limited number of occurring species, even by very conservative estimates [13]. What is more surprising is that multiple distinct protein sequences can share the same threedimensional structure. Proteins that share at least 25% sequence similarity form families of homologs ( known as fold families) [14,15,16,17,18]. While it is clear that the creation of new folds may not be as difficult as it was initially believed [21], it is unclear why a small fraction of possible protein fold space is explored in nature, limiting the zoo to only few species
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have