Abstract

BackgroundAmino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins. Profile models assume the statistical independence of each position in the sequence, but the energetics of protein folding is better captured in a scoring function that is based on pairwise interactions, like a force field.ResultsI-sites motifs are short sequence/structure motifs that populate the protein structure database due to energy-driven convergent evolution. Here we show that a pairwise covariant sequence model does not predict alpha helix or beta strand significantly better overall than a profile-based model, but it does improve the prediction of certain loop motifs. The finding is best explained by considering secondary structure profiles as multivariant, all-or-none models, which subsume covariant models. Pairwise covariance is nonetheless present and energetically rational. Examples of negative design are present, where the covariances disfavor non-native structures.ConclusionMeasured pairwise covariances are shown to be statistically robust in cross-validation tests, as long as the amino acid alphabet is reduced to nine classes. An updated I-sites local structure motif library that provides sequence covariance information for all types of local structure in globular proteins and a web server for local structure prediction are available at .

Highlights

  • Amino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins

  • The short amino acid sequence patterns associated with I-sites motifs have been found to correlate with common local structures in proteins and match short local structures, such as helix caps and betaturns, with sequence probability distributions, or profiles

  • Significant improvements in prediction over the previous I-sites method were obtained by using a better model for measuring confidence and by retraining the Isites profiles on a larger and newer dataset (ISL5.1)

Read more

Summary

Introduction

Amino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins. One of the key approaches to this problem has been to describe fragments that represent specific local structural elements in libraries [1,2]. I-sites motifs represent small, independently folding substructures in proteins and may play a role in initiating the folding process[2]. The short amino acid sequence patterns associated with I-sites motifs have been found to correlate with common local structures in proteins and match short local structures, such as helix caps and betaturns, with sequence probability distributions, or profiles. I-sites motifs were found by clustering short sequence patterns from proteins of known structure after (page number not for citation purposes)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call