Abstract
Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.
Highlights
Computational protein design has made tremendous progress in recent years
The protein structures we considered were subdivided into two distinct data sets, a data set of 38 yeast protein structures previously analyzed by Ramsey et al (2011) and a data set of 40 protein domains previously analyzed by Ollikainen & Kortemme (2013)
When switching from fixed-backbone design to variable-backbone design, we found that overall site variability increased
Summary
Computational design has been used successfully to engineer proteins that bind to an influenza virus (Fleishman et al, 2011), to create enzymes (Rothlisberger et al, 2008), and to develop novel protein folds not seen in nature (Kuhlman et al, 2003). All these examples have in common that many different computational predictions were generated, and. Sites in the core tend to be conserved because mutations at these sites are more likely to destabilize the protein fold, due to steric clashes (Chothia & Finkelstein, 1990)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have