Abstract
Homology modeling predicts the 3D structure of a query protein based on the sequence alignment with one or more template proteins of known structure. Its great importance for biological research is owed to its speed, simplicity, reliability and wide applicability, covering more than half of the residues in protein sequence space. Although multiple templates have been shown to generally increase model quality over single templates, the information from multiple templates has so far been combined using empirically motivated, heuristic approaches.We present here a rigorous statistical framework for multi-template homology modeling. First, we find that the query proteins’ atomic distance restraints can be accurately described by two-component Gaussian mixtures. This insight allowed us to apply the standard laws of probability theory to combine restraints from multiple templates. Second, we derive theoretically optimal weights to correct for the redundancy among related templates. Third, a heuristic template selection strategy is proposed.We improve the average GDT-ha model quality score by 11% over single template modeling and by 6.5% over a conventional multi-template approach on a set of 1000 query proteins. Robustness with respect to wrong constraints is likewise improved. We have integrated our multi-template modeling approach with the popular MODELLER homology modeling software in our free HHpred server http://toolkit.tuebingen.mpg.de/hhpred and also offer open source software for running MODELLER with the new restraints at https://bitbucket.org/soedinglab/hh-suite.
Highlights
By far the most widely used computational approach for protein structure prediction relies on detecting a homologous relationship with a protein of known structure and using this protein as a template to model the PLOS Computational Biology | DOI:10.1371/journal.pcbi
In this study we extend the probabilistic formulation of homology modelling to the consistent treatment of multiple templates
Homology modeling consists of four steps: (1) Finding homologous template proteins of known structure, (2) Selecting the best template or set of templates, (3) Optimizing the multiple sequence alignment (MSA) between query and template protein sequences, and (4) Building the homology model for the query sequence that resembles as closely as possible the structures of the templates, accommodating for deletions and insertions of query residues with respect to the template structures
Summary
Homology modeling is by far the most widely used computational approach to predict the 3D structures of proteins, and almost all protein structure prediction servers rely on homology modeling, as seen in the community-wide blind benchmark “Critical Assessment of Techniques for Protein Structure Prediction” (CASP) [1,2,3].Homology modeling consists of four steps: (1) Finding homologous template proteins of known structure, (2) Selecting the best template or set of templates, (3) Optimizing the multiple sequence alignment (MSA) between query and template protein sequences, and (4) Building the homology model for the query sequence that resembles as closely as possible the structures of the templates, accommodating for deletions and insertions of query residues with respect to the template structures.During the last 15 years, much progress has been made regarding the sequence-based steps 1 to 3. Improvements to the last step have been marginal This is illustrated by the fact that, a number of tools for protein homology modeling exist, to our knowledge all are older than 12 years (see [7, 8] for reviews). NEST [11] implements an artificial evolution algorithm where changes from the template structure such as substitutions, insertions and deletions are made one at a time, and each mutation is followed by an energy minimization. This process is repeated until the whole query is modeled
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.