Abstract

BackgroundComparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Given that the quality of a model may vary greatly, several studies have been devoted to identifying the factors that influence modelling results. These studies usually consider the protein as a whole, and only a few provide a separate discussion of the behaviour of biologically relevant features of the protein. Given the value of the latter for many applications, here we extended previous work by analysing the preservation of native protein clefts in homology models. We chose to examine clefts because of their role in protein function/structure, as they are usually the locus of protein-protein interactions, host the enzymes' active site, or, in the case of protein domains, can also be the locus of domain-domain interactions that lead to the structure of the whole protein.ResultsWe studied how the largest cleft of a protein varies in comparative models. To this end, we analysed a set of 53507 homology models that cover the whole sequence identity range, with a special emphasis on medium and low similarities. More precisely we examined how cleft quality – measured using six complementary parameters related to both global shape and local atomic environment, depends on the sequence identity between target and template proteins. In addition to this general analysis, we also explored the impact of a number of factors on cleft quality, and found that the relationship between quality and sequence identity varies depending on cleft rank amongst the set of protein clefts (when ordered according to size), and number of aligned residues.ConclusionWe have examined cleft quality in homology models at a range of seq.id. levels. Our results provide a detailed view of how quality is affected by distinct parameters and thus may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models. In addition, the large variability in model quality that we observed within each sequence bin, with good models present even at low sequence identities (between 20% and 30%), indicates that properly developed identification methods could be used to recover good cleft models in this sequence range.

Highlights

  • Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure

  • Our results provide a detailed view of how quality is affected by distinct parameters and may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models

  • We provide data for the entire seq.id. range, we focused on the behaviour of comparative models in the medium (30% – 60%) and low (< 30%) ranges for the following reasons: (i) the quality of homology models above 60% seq.id. is usually high [1,23]; (ii) biochemical function above 60% seq.id. is usually conserved [43,44,45]; (iii) target selection protocols in structural genomics projects usually rely on a 30% seq.id. threshold to obtain a maximal coverage [6,46]; and (iv) comparative modelling is possible below 30% seq.id. because the protein structure is preserved below this threshold [43,47,48]

Read more

Summary

Introduction

Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Structural genomics projects have addressed this challenge and have led to the design and development of high-throughput production pipelines for structure determination [2,10,11,12,13,14,15] This considerable research effort is starting to give results and recent reports show a clear increase in the number of known structures, and of structures showing new folds, solved in structural genomics projects [16,17,18,19,20]. Drug design (probably the most demanding application of homology models) requires high quality models that are usually obtained for sequence identity (seq.id.) levels above 70% between the target and template [23,24]. A series of independent studies [24,26,27,28,29], as well as the results of CASP experiments [30,31,32,33,34,35,36,37,38,39,40,41], give the user of comparative modelling a good idea of the model's overall performance, and how the latter can be estimated from the seq.id. between the target and template sequences

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.