Predicting the Tolerated Sequences for Proteins and Protein Interfaces Using RosettaBackrub Flexible Backbone Design

Colin A Smith,Tanja Kortemme

doi:10.1371/journal.pone.0020451

Colin A Smith, Tanja Kortemme

Open Access

PDF Available

https://doi.org/10.1371/journal.pone.0020451

Copy DOI

Export

Save

Cite

Journal: PLoS ONE	Publication Date: Jul 18, 2011
Citations: 89	License type: CC BY 4.0

Affiliation: University of California, San Francisco, QB3

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

Highlights

The concept of ‘‘tolerated sequence space’’ – the set of sequences that a given protein can tolerate while still preserving its function at a defined level – has enabled considerable advances in understanding protein sequence-structure relationships and engineering new functions [1]
We show example results that assess the performance of RosettaBackrub sequence tolerance predictions using three different experimental datasets that determined tolerated sequences for protein fold stability [9] and protein binding [23,35] using phage display
The generalized protocol captures a significant fraction of the observed sequence space in all three datasets (Table 1), with values for the area under a ROC curve between 0.64 and 0.87, and the fraction of sequence space captured by the top 5 ranked amino acid types between 54 and 82%

Summary

Introduction

The concept of ‘‘tolerated sequence space’’ – the set of sequences that a given protein can tolerate while still preserving its function at a defined level – has enabled considerable advances in understanding protein sequence-structure relationships and engineering new functions [1]. Even in cases where it is especially difficult to predict sequences optimized for a given function (for example the rate of an enzymatic reaction or the emission spectrum of a fluorescent protein), screening from a pool of predicted tolerated sequences can increase the likelihood of diversifying existing or identifying new functions [6]. To experimentally estimate the tolerated sequence space for a given protein fold, one can either use sequence alignments of orthologous proteins, or a high throughput technique such as phage display. Phage display selects for protein binding, but through the use of a binding partner that does not interact directly with the mutated amino acids, binding can be used as a proxy for protein stability. Computational methods that can reduce the enormous number of possible sequences to those that are more likely to be functional are extremely useful, in particular to focus libraries that can be screened experimentally much more efficiently

Methods

Results

Conclusion