Abstract
BackgroundSystematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need.ResultsIn this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes.ConclusionExperimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
Highlights
Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions
Alanine scanning mutagenesis is the most widely used technique for identifying hot spot residues. When these hot spot residues have been mutated to alanine, they would lead to a striking loss in binding free energy [14]
Our results show that lower relative ASA (relASA) and larger relative change in accessible surface area (ASA) are critical for hot spots distinguishing from non-hot spots
Summary
Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. To fully understand or control biological processes, we need to probe the principles of protein-protein interactions. Alanine scanning mutagenesis is the most widely used technique for identifying hot spot residues. When these hot spot residues have been mutated to alanine, they would lead to a striking loss in binding free energy [14]. Many studies have demonstrated that most interface residues could be mutated without changing the affinity of proteins complexes [13,18]. Systematic analyses have shown that hot spot residues are abundant in Tyr, Trp, and Arg [14,19]. They form a complicated network of interactions and contribute to the stability of the complex; the contributions of independent clusters are additive [21]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.