Abstract

The task of engineering a protein to assume a target three-dimensional structure is known as protein design. Computational search algorithms are devised to predict a minimal energy amino acid sequence for a particular structure. In practice, however, an ensemble of low-energy sequences is often sought. Primarily, this is performed because an individual predicted low-energy sequence may not necessarily fold to the target structure because of both inaccuracies in modeling protein energetics and the nonoptimal nature of search algorithms employed. Additionally, some low-energy sequences may be overly stable and thus lack the dynamic flexibility required for biological functionality. Furthermore, the investigation of low-energy sequence ensembles will provide crucial insights into the pseudo-physical energy force fields that have been derived to describe structural energetics for protein design. Significantly, numerous studies have predicted low-energy sequences, which were subsequently synthesized and demonstrated to fold to desired structures. However, the characterization of the sequence space defined by such energy functions as compatible with a target structure has not been performed in full detail. This issue is critical for protein design scientists to successfully continue using these force fields at an ever-increasing pace and scale. In this paper, we present a conceptually novel algorithm that rapidly predicts the set of lowest energy sequences for a given structure. Based on the theory of probabilistic graphical models, it performs efficient inspection and partitioning of the near-optimal sequence space, without making any assumptions of positional independence. We benchmark its performance on a diverse set of relevant protein design examples and show that it consistently yields sequences of lower energy than those derived from state-of-the-art techniques. Thus, we find that previously presented search techniques do not fully depict the low-energy space as precisely. Examination of the predicted ensembles indicates that, for each structure, the amino acid identity at a majority of positions must be chosen extremely selectively so as to not incur significant energetic penalties. We investigate this high degree of similarity and demonstrate how more diverse near-optimal sequences can be predicted in order to systematically overcome this bottleneck for computational design. Finally, we exploit this in-depth analysis of a collection of the lowest energy sequences to suggest an explanation for previously observed experimental design results. The novel methodologies introduced here accurately portray the sequence space compatible with a protein structure and further supply a scheme to yield heterogeneous low-energy sequences, thus providing a powerful instrument for future work on protein design.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.