Abstract

In this special issue of the Journal of Bacteriology, bacteriologists look into the smallest organisms even deeper than before, down to the molecular level. The focus is on experimentally determined molecular structures. However, structure prediction from amino acid sequence data is becoming a usable source of protein structure information as well. Interest in protein structure prediction is old, but success is new. About 9 years ago, John Moult and others organized the first effort known as Critical Assessment of Protein Structure Prediction (CASP). They arranged with experimentalists to provide amino acid sequence information for soon-to-be-determined protein structures and invited the protein prediction community to try their methods on these target unknowns. Predictors submitted their results to the organizers for evaluation against the true structures when they became available. The format of using a community-wide experiment and a meeting to present the evaluations to the predictors propelled the improvement of methods. Last December the fifth evaluation meeting of the biennial CASP effort (CASP5) was held at Asilomar Conference Grounds in Pacific Grove, Calif. (7). The success of the best of the predictors in the last two CASP evaluations (7, 8) warrants mention of the methods and results here. Methods for prediction are different for easy and hard cases. The choice of method depends on the degree of similarity between the amino acid sequence of the unknown and the sequences of known structures. THE HARDEST TEST Even though they have the worst agreement with the experimental results, the most exciting predictions are the successes in the “new fold” category, where the sequence of the unknown has no significant similarity to the sequence of any known structure. Five of the eighty-odd domains available for prediction fell into this category in CASP5. In this most difficult category, the evaluator considered that at least one “excellent” prediction was made for each target. Of 165 predictors who attempted these difficult targets, nine had a prediction among the best ten (out of hundreds) for three or more of the targets. So some techniques consistently perform better than the rest. In the new fold category, a respectable result means that the predicted chain has the same kinds of pieces in the same relative orientations, not that the pieces superimpose on each other. The degree of agreement might be similar to that between photographs of the same person at age 20 and at age 80. In fact, predicting a new fold is like drawing a face that the artist has never seen. And in fact, structure prediction methods are like the methods used by police artists, in an important sense. A witness is shown a gallery of faces and asked to pick out parts from them that individually resemble parts of the suspect’s face. The police artist then combines the parts into a whole that resembles the witness’s memory of the face. The most successful methods of structure prediction for new folds similarly rely on the assembly of a unique whole from fragments selected from a gallery of protein structures. ONE OF THE GOOD METHODS In a coarse description of the most successful method of new fold prediction, the first step is to obtain secondary structure (helix, beta strand, etc.) predictions for the unknown and to divide the sequence of the unknown into short fragments (nine amino acids). Then known structures (the equivalent of the gallery of faces) are searched for fragments that are similar in secondary structure and/or sequence profile to the unknown’s fragments. A library of these fragments from known structures is constructed (the equivalent of the collection of witnessselected individual features). The starting guess for the unknown structure is a completely extended chain (equivalent to the blank paper), but randomly selected suitable fragments repeatedly replace sections of the extended chain. After each fragment placement (“move”), the chain is checked for collisions and other bad and good features, and the move is rejected or accepted. After a large number (thousands) of fragment placements, a folded chain has been created (the equivalent of a single face). In contrast to the limited number of faces an artist could produce, however, tens of thousands of candidate structures are produced. The candidate structures are clustered according to their structural similarity to each other, and the centers of the few largest clusters are selected as the best candidate structures. Final adjustments to the candidates are made to make the models more physically realistic. The method’s increasing power lies in the improving selection of the contents of the fragment library and in the improving rules for accepting or rejecting a fragment placement. (For further detail and other methods, see reference 7.) In CASP5, the method just described was used effectively not only for new folds but also for loop regions in unknowns where a structure for a related sequence was available. The loops were modeled by the new fold method, but otherwise the prediction was closely guided by the template (“comparative modeling”). Why use a template?

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call