Abstract

Computational protein design of an ensemble of conformations for one protein-i.e., multi-state design-determines the side chain identity by optimizing the energetic contributions of that side chain in each of the backbone conformations. Sampling the resulting large sequence-structure search space limits the number of conformations and the size of proteins in multi-state design algorithms. Here, we demonstrated that the REstrained CONvergence (RECON) algorithm can simultaneously evaluate the sequence of large proteins that undergo substantial conformational changes. Simultaneous optimization of side chain conformations across all conformations increased sequence conservation when compared to single-state designs in all cases. More importantly, the sequence space sampled by RECON MSD resembled the evolutionary sequence space of flexible proteins, particularly when confined to predicting the mutational preferences of limited common ancestral descent, such as in the case of influenza type A hemagglutinin. Additionally, we found that sequence positions which require substantial changes in their local environment across an ensemble of conformations are more likely to be conserved. These increased conservation rates are better captured by RECON MSD over multiple conformations and thus multiple local residue environments during design. To quantify this rewiring of contacts at a certain position in sequence and structure, we introduced a new metric designated 'contact proximity deviation' that enumerates contact map changes. This measure allows mapping of global conformational changes into local side chain proximity adjustments, a property not captured by traditional global similarity metrics such as RMSD or local similarity metrics such as changes in φ and ψ angles.

Highlights

  • Computational protein design solves the so-called ‘inverse folding problem’ by identifying an amino acid sequence that is compatible with a given protein structure, i.e., backbone conformation and possibly interactions with partner biomolecules

  • It should be noted that we found the metrices RMSDda and contact proximity deviation were not independent variables, as we determined that contact proximity deviation is significantly, not strongly, negatively correlated with RMSDda (S5 Fig), meaning that residues with contact proximity deviation values close to or at zero were more likely to have a higher RMSDda values

  • We demonstrated that REstrained CONvergence (RECON) multi-state design (MSD) significantly improves the similarity to evolutionary mutation preferences from single-state design (SSD) selected mutation profiles by selecting sequences which are energetically favorable for an ensemble of local side-chain interactions

Read more

Summary

Introduction

Computational protein design solves the so-called ‘inverse folding problem’ by identifying an amino acid sequence that is compatible with a given protein structure, i.e., backbone conformation and possibly interactions with partner biomolecules. This approach allows for the molecule to conduct its function in this single state. Determining functionally relevant sequence tolerance, or rather, the set of amino acid sequences that are allowable given a protein’s function, depends on identifying the set of amino acid sequences that is stable in each of the conformations needed. The picture gets even more complicated if we look at functionally relevant conformations that are by definition local free energy minima (i.e., thermodynamics) and include an analysis of the height of barriers connecting these states that determine the kinetics of interconversion

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call