In 1996, in CASP2, we presented a semimanual approach to the prediction of protein structure that was aimed at the recognition of probable distant homology, where it existed, between a given target protein and a protein of known structure (Murzin and Bateman, Proteins 1997; Suppl 1:105-112). Central to our method was the knowledge of all known structural and probable evolutionary relationships among proteins of known structure classified in the SCOP database (Murzin et al., J Mol Biol 1995;247:536-540). It was demonstrated that a knowledge-based approach could compete successfully with the best computational methods of the time in the correct recognition of the target protein fold. Four years later, in CASP4, we have applied essentially the same knowledge-based approach to distant homology recognition, concentrating our effort on the improvement of the completeness and alignment accuracy of our models. The manifold increase of available sequence and structure data was to our advantage, as well as was the experience and expertise obtained through the classification of these data. In particular, we were able to model most of our predictions from several distantly related structures rather than from a single parent structure, and we could use more superfamily characteristic features for the refinement of our alignments. Our predictions for each of the attempted distant homology recognition targets ranked among the few top predictions for each of these targets, with the predictions for the hypothetical protein HI0065 (T0104) and the C-terminal domain of the ABC transporter MalK (T0121C) being particularly successful. We also have attempted the prediction of protein folds of some of the targets tentatively assigned to new superfamilies. The average quality of our fold predictions was far less than the quality of our distant homology recognition models, but for the two targets, chorismate lyase (T0086) and Appr>p cyclic phosphodiesterase (T0094), our predictions achieved the top ranking.
Read full abstract