Prediction of protein-protein interaction sites between Russell's viper PLA2 and the γ-phospholipase inhibitor PIP from the amino acid frequency distribution of a bio-panned peptide set.
We screened a random peptide phage display library using Russell's viper venom phospholipase A2 (RV-PLA2) as bait. Sequence information from the resulting set of bio-panned heptapeptides was analyzed and mined to determine likely sites of interaction between two subunits of RV-PLA2 homo dimers and between RV-PLA2 and the γPLA2 inhibitor PIP from Malayopython reticulatus. This was accomplished in part by sequence alignment of the affinity-selected peptides with the sequences of RV-PLA2 and PIP. Because similarity scores calculated from sequence alignments proved inadequate to determine interaction interfaces accurately for RV-PLA2 dimers, we explored the use of amino acid frequency-based interactions scores (SFI/SFIN) for a more accurate prediction of protein-protein interaction sites. Heptamers with elevated SFI(N) scores were compared to interfaces of interaction observed in crystal structures of RV-PLA2 homodimers and to sites of interaction predicted by protein-protein docking between structures of RV-PLA2 and model of PIP. Segments with a high density of protein-protein contacts coincided with heptamer sequences exhibiting SFI and/or SFIN scores significantly above average, in both RV-PLA2 homodimers and in RV-PLA2 γPLI heteromeric structures. Elevated SFI and SFIN scores were associated with peptide function since the heptamers with some of the highest SFI and SFI(N) scores, LPGLPLS, GLPLSLQ and SLQNGLY constitute the known PLA2 inhibitor P-PB.I (LPGLPLSLQNGLY) while KLGRVDI, and WDGVYIR, constitute PIP-17 (LGRVDIHVWDGVYIRGR), IC50 for hsPLA2: 5.3μM. A graph showing the alignment of maxima between SFI scores and average solvent accessibility (per heptamer) suggests that solvent accessibility is a major driver of both protein-protein interaction and phage selection. Insights We show by computational methods that in sets of small phage-displayed peptides of the same length selected for binding to the same target protein, amino acids contributing to binding at a particular position occur at higher frequencies than in random peptides. This position-specific selection of particular amino acids can be detected in the position-specific amino acid frequency distribution of that set of selected peptides. Therefore, when this position-specific amino acid frequency is mapped back onto a particular amino acid sequence of the same length, the sum of these frequencies can function as a measure of enrichment of selected amino acids.
132
- 10.1016/s0022-2836(02)00844-6
- Sep 28, 2002
- Journal of Molecular Biology
22
- 10.1016/j.toxicon.2008.04.167
- May 29, 2008
- Toxicon
129
- 10.1016/0041-0101(89)90136-0
- Jan 1, 1989
- Toxicon
124
- 10.1016/j.ab.2004.09.048
- Dec 8, 2004
- Analytical Biochemistry
257
- 10.1002/anie.200400618
- May 3, 2005
- Angewandte Chemie International Edition
43
- 10.1016/j.biochi.2013.11.026
- Dec 10, 2013
- Biochimie
40
- 10.1046/j.0014-2956.2001.02711.x
- Jan 1, 2002
- European Journal of Biochemistry
31
- 10.1016/j.toxicon.2009.12.023
- Jan 4, 2010
- Toxicon
17
- 10.1023/a:1016591318855
- Aug 1, 2002
- Molecular and Cellular Biochemistry
42
- 10.1016/j.jmb.2005.12.039
- Dec 27, 2005
- Journal of Molecular Biology
- Research Article
2
- 10.1155/2022/5892627
- Oct 4, 2022
- Disease Markers
Prediction of protein-protein interaction (PPI) sites is one of the most perplexing problems in drug discovery and computational biology. Although significant progress has been made by combining different machine learning techniques with a variety of distinct characteristics, the problem still remains unresolved. In this study, a technique for PPI sites is presented using a random forest (RF) algorithm followed by the minimum redundancy maximal relevance (mRMR) approach, and the method of incremental feature selection (IFS). Physicochemical properties of proteins and the features of the residual disorder, sequence conservation, secondary structure, and solvent accessibility are incorporated. Five 3D structural characteristics are also used to predict PPI sites. Analysis of features shows that 3D structural features such as relative solvent-accessible surface area (RASA) and surface curvature (SC) help in the prediction of PPI sites. Results show that the performance of the proposed predictor is superior to several other state-of-the-art predictors, whose average prediction accuracy is 81.44%, sensitivity is 82.17%, and specificity is 80.71%, respectively. The proposed predictor is expected to become a helpful tool for finding PPI sites, and the feature analysis presented in this study will give useful insights into protein interaction mechanisms.
- Research Article
107
- 10.1371/journal.pone.0043927
- Aug 28, 2012
- PLoS ONE
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.
- Conference Article
- 10.1109/smc53654.2022.9945307
- Oct 9, 2022
Protein-protein interaction(PPI) site prediction is a deep-level exploration of the mechanism of life activity, but relying solely on experimental methods to identify PPI sites is hugely costly. This method is advantageous among the developed computational methods using structural information. For the relative solvent accessibility (RSA) of protein structural information, the absolute values of solvent accessibility derived from the program named DSSP (Kabsch and Sander, 1983) were primarily used and then normalized using the highest exposure area of the amino acid type determined in the past. It is difficult to obtain suitable RSA when protein structure information cannot be obtained by homologous transfer, and thus the use of RSA is limited. We used the latest deep learning prediction tools to mine potentially valuable information from long-range interactions inside protein sequences and used it for protein RSA prediction. In a deep graph convolutional neural network, we incorporate the predicted relative solvent accessibility (PRSA) into the original structural information and then combine the sequence information and evolutionary information to form graph node features. We showed that our proposed method significantly improves the performance of AUPRC and MCC by over 9.5% and 21% compared to other sequence-based and structure-based methods. Furthermore, it was demonstrated by analyzing the method that the PRSA plays a crucial role in PPI site prediction.
- Research Article
19
- 10.1007/s12038-015-9564-y
- Sep 28, 2015
- Journal of Biosciences
Protein-protein interaction (PPI) site prediction aids to ascertain the interface residues that participate in interaction processes. Fuzzy support vector machine (F-SVM) is proposed as an effective method to solve this problem, and we have shown that the performance of the classical SVM can be enhanced with the help of an interaction-affinity based fuzzy membership function. The performances of both SVM and F-SVM on the PPI databases of the Homo sapiens and E. coli organisms are evaluated and estimated the statistical significance of the developed method over classical SVM and other fuzzy membership-based SVM methods available in the literature. Our membership function uses the residue-level interaction affinity scores for each pair of positive and negative sequence fragments. The average AUC scores in the 10-fold cross-validation experiments are measured as 79.94% and 80.48% for the Homo sapiens and E. coli organisms respectively. On the independent test datasets, AUC scores are obtained as 76.59% and 80.17% respectively for the two organisms. In almost all cases, the developed F-SVM method improves the performances obtained by the corresponding classical SVM and the other classifiers, available in the literature.
- Research Article
61
- 10.3390/ijms21072274
- Mar 25, 2020
- International Journal of Molecular Sciences
The study of protein-protein interaction is of great biological significance, and the prediction of protein-protein interaction sites can promote the understanding of cell biological activity and will be helpful for drug development. However, uneven distribution between interaction and non-interaction sites is common because only a small number of protein interactions have been confirmed by experimental techniques, which greatly affects the predictive capability of computational methods. In this work, two imbalanced data processing strategies based on XGBoost algorithm were proposed to re-balance the original dataset from inherent relationship between positive and negative samples for the prediction of protein-protein interaction sites. Herein, a feature extraction method was applied to represent the protein interaction sites based on evolutionary conservatism of proteins, and the influence of overlapping regions of positive and negative samples was considered in prediction performance. Our method showed good prediction performance, such as prediction accuracy of 0.807 and MCC of 0.614, on an original dataset with 10,455 surface residues but only 2297 interface residues. Experimental results demonstrated the effectiveness of our XGBoost-based method.
- Research Article
116
- 10.1016/j.neucom.2016.02.022
- Feb 22, 2016
- Neurocomputing
Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests
- Dissertation
- 10.31274/etd-180810-1200
- Oct 31, 2012
Protein-protein interactions play a central role in the formation of protein complexes and the biological pathways that orchestrate virtually all cellular processes. Reliable identification of the specific amino acid residues that form the interface of a protein with one or more other proteins is critical to understanding the structural and physico-chemical basis of protein interactions and their role in key cellular processes, predicting protein complexes, validating protein interactions predicted by high throughput methods, and identifying and prioritizing drug targets in computational drug design. Because of the difficulty and the high cost of experimental characterization of interface residues, there is an urgent need for computational methods for reliable predicting protein-protein interface residues from the sequence, and when available, the structure of a query protein, and when known, its putative interacting partner. Against this background, this thesis develops improved methods for predicting protein-protein interface residues and protein-protein interfaces from the three dimensional structure of an unbound query protein without considering information of its binding protein partner. Towards this end, we develop (i) ProtInDb (http://protindb.cs.iastate.edu), a database of protein-protein interface residues to facilitate (a) the generation of datasets of protein-protein interface residues that can be used to perform analysis of interaction sites and to train and evaluate predictors of interface residues, and (b) the visualization of interaction sites between proteins in both the amino acid sequences and the 3D protein structures, among other applications; (ii) PoInterS (http://pointers.cs.iastate.edu/), a method for predicting protein-protein interaction sites formed by spatially contiguous clusters of interface residues based on the predictions generated by a protein interface residue predictor. PoInterS divides a protein surface into a series of patches composed of several surface residues, and uses the outputs of the interface residue predictors to rank and select a small set of patches that are the most likely to constitute the interaction sites; and (iii) PrISE (http://prise.cs.iastate.edu/), a method for predicting protein-protein interface residues based on the similarity of the structural element formed by the query residue and its neighboring residues and the structural elements extracted from the interface and non-interface regions of proteins that are members of experimentally determined protein complexes. A structural element captures the atomic composition and solvent accessibility of a central residue and its closest neighbors in the protein structure. PrISE decomposes a query protein into a set of structural elements and searches for similar elements in a large set of proteins that belong to one or more experimentally determined complexes. The structural elements that are most similar to each structural element extracted from the query protein are then used to infer whether its central residue is or is not an interface residue. The results of our experiments using a variety of benchmark datasets show that PoInterS and PrISE generally outperform the state-of-the-art structure-based methods for predicting interaction patches and interface residues, respectively.
- Research Article
- 10.1021/acsomega.5c06314
- Nov 8, 2025
- ACS Omega
Identification of protein–protein interaction(PPI) sitesis crucial for understanding molecular recognition. Experimental identificationof PPI is expensive, time-consuming, and laborious. A large numberof computational methods addressed this problem. However, no computationsspecifically addressed PPI site prediction for the frequently mutatinginfluenza A virus (IAV) genome that invades human hosts. For the firsttime, we report the prediction of PPI sites on the IAV genome (proteinsequences). The method was benchmarked across various machine-learningmodels, optimizing class imbalance and unlabeled data types. The best-performingmodels were (i) the gradient boosting model, augmented with minorityclass oversampling and positive unlabeled (PU) learning and (ii) theprotein-specific bidirectional encoder representations from transformers(Prot-BERT) combined with an artificial neural network (ANN) (termedthe Prot-BERT-ANN model), adjusted with class weight correction andthreshold tuning. The models were trained on two types of interactionsite data sets: one obtained from diverse protein families (Train-1)(17995 amino acid sites) with known interaction sites from proteinstructures and the other from the IAV consensus protein sequences(3322 amino acid sites) with experimentally annotated PPI sites onthe conserved regions of the proteins (Train-2). External validationwas performed on two test data sets: (i) from six IAV proteins, M1,NS1, NEP, NP, PB1, and PB2, reported to interact with host factors,with experimentally annotated PPI sites on the nonconserved regionof the proteins (Test-1), and (ii) the SARS-CoV-2 spike protein sequence(195 amino acid sites) (Test-2). Blind prediction was performed onthree IAV protein sequencesNA, HA, and M2curated fromthe Human Viral Interaction Database (HVIDB). The prediction aimedto decipher the effect of amino acid substitutions on the protein–proteininteraction sites of the viral genome. The gradient boosting methodwith oversampling and PU learning, trained on the Train-2 data set,consistently performed better on both external validation data sets.The recall values obtained from the predictions on the Test-1 dataset were compared with the published D-SCRIPT (a neural language-basedmodel) results. The gradient boosting model showed a higher averagerecall value (0.53 ± 0.04) for six IAV proteins compared to theD-SCRIPT results (0.18 ± 0.19). The gradient boosting predictionfor the experimentally reported PPI sites on the SARS-CoV-2 spikeprotein (Test-2 data set) was 55% accurate, despite Test-2 being independentof Train-2. The results indicated the generalizability and interpretabilityof the gradient boosting model for IAV PPI site predictions. The effectsof amino acid substitutions on PPI sites were demonstrated on fiveMatrix 1 (M1) protein sequences. This approach could be used to identifythe PPI sites on newly emerging viral strains (e.g., influenza virus,SARS-CoV-2, etc.) with potential applications for drug design, improvementof drug binding, or drug repurposing, subject to further validation.
- Research Article
7
- 10.1055/s-0043-1769625
- Apr 1, 2023
- TH Open
Envenomings by Russell's viper ( Daboia russelii ), a species of high medical importance in India and other Asian countries, commonly result in hemorrhage, coagulopathies, necrosis, and acute kidney injury. Although bleeding complications are frequently reported following viper envenomings, thrombotic events occur rarely (reported only in coronary and carotid arteries) with serious consequences. For the first time, we report three serious cases of peripheral arterial thrombosis following Russell's viper bites and their diagnostic, clinical management, and mechanistic insights. These patients developed occlusive thrombi in their peripheral arteries and symptoms despite antivenom treatment. In addition to clinical features, computed tomography angiography was used to diagnose arterial thrombosis and ascertain its precise locations. They were treated using thrombectomy or amputation in one case that presented with gangrenous digits. Mechanistic insights into the pathology through investigations revealed the procoagulant actions of Russell's viper venom in standard clotting tests as well as in rotational thromboelastometry analysis. Notably, Russell's viper venom inhibited agonist-induced platelet activation. The procoagulant effects of Russell's viper venom were inhibited by a matrix metalloprotease inhibitor, marimastat, although a phospholipase A 2 inhibitor (varespladib) did not show any inhibitory effects. Russell's viper venom induced pulmonary thrombosis when injected intravenously in mice and thrombi in the microvasculature and affected skeletal muscle when administered locally. These data emphasize the significance of peripheral arterial thrombosis in snakebite victims and provide awareness, mechanisms, and robust strategies for clinicians to tackle this issue in patients.
- Research Article
105
- 10.1016/j.neucom.2019.05.013
- May 10, 2019
- Neurocomputing
Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network
- Research Article
173
- 10.1093/protein/gzh020
- Jan 20, 2004
- Protein Engineering Design and Selection
The identification of protein-protein interaction sites is essential for the mutant design and prediction of protein-protein networks. The interaction sites of residue units were predicted using support vector machines (SVM) and the profiles of sequentially/spatially neighboring residues, plus additional information. When only sequence information was used, prediction performance was highest using the feature vectors, sequentially neighboring profiles and predicted interaction site ratios, which were calculated by SVM regression using amino acid compositions. When structural information was also used, prediction performance was highest using the feature vectors, spatially neighboring residue profiles, accessible surface areas, and the with/without protein interaction sites ratios predicted by SVM regression and amino acid compositions. In the latter case, the precision at recall = 50% was 54-56% for a homo-hetero mixed test set and >20% higher than for random prediction. Approximately 30% of the residues wrongly predicted as interaction sites were the closest sequentially/spatially neighboring on the interaction site residues. The predicted residues covered 86-87% of the actual interfaces (96-97% of interfaces with over 20 residues). This prediction performance appeared to be slightly higher than a previously reported study. Comparing the prediction accuracy of each molecule, it seems to be easier to predict interaction sites for stable complexes.
- Research Article
405
- 10.1002/prot.21248
- Dec 6, 2006
- Proteins: Structure, Function, and Bioinformatics
The recognition of protein interaction sites is an important intermediate step toward identification of functionally relevant residues and understanding protein function, facilitating experimental efforts in that regard. Toward that goal, the authors propose a novel representation for the recognition of protein-protein interaction sites that integrates enhanced relative solvent accessibility (RSA) predictions with high resolution structural data. An observation that RSA predictions are biased toward the level of surface exposure consistent with protein complexes led the authors to investigate the difference between the predicted and actual (i.e., observed in an unbound structure) RSA of an amino acid residue as a fingerprint of interaction sites. The authors demonstrate that RSA prediction-based fingerprints of protein interactions significantly improve the discrimination between interacting and noninteracting sites, compared with evolutionary conservation, physicochemical characteristics, structure-derived and other features considered before. On the basis of these observations, the authors developed a new method for the prediction of protein-protein interaction sites, using machine learning approaches to combine the most informative features into the final predictor. For training and validation, the authors used several large sets of protein complexes and derived from them nonredundant representative chains, with interaction sites mapped from multiple complexes. Alternative machine learning techniques are used, including Support Vector Machines and Neural Networks, so as to evaluate the relative effects of the choice of a representation and a specific learning algorithm. The effects of induced fit and uncertainty of the negative (noninteracting) class assignment are also evaluated. Several representative methods from the literature are reimplemented to enable direct comparison of the results. Using rigorous validation protocols, the authors estimated that the new method yields the overall classification accuracy of about 74% and Matthews correlation coefficients of 0.42, as opposed to up to 70% classification accuracy and up to 0.3 Matthews correlation coefficient for methods that do not utilize RSA prediction-based fingerprints. The new method is available at http://sppider.cchmc.org.
- Book Chapter
1
- 10.1007/978-3-642-45062-4_94
- Jan 1, 2013
Predicting residues that participate in protein–protein interactions (PPI) helps to identify the amino acids located at the interface. In this work, experimentally verified 3-D structures of protein complexes are used for building the training model and subsequent prediction protein interactions from sequence information. Fuzzy SVM (F-SVM), which is developed on top of the classical SVM, is an effective method to solve this problem and we demonstrate that the performance of the SVM can further be improved with the use of a custom-designed fuzzy membership function. We evaluate the performances of both SVM and F-SVM on the PPI database of the Homo sapiens organism and evaluate the statistical significance of F-SVM over classical SVM. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used. The F-SVM based residues prediction method exploits the membership function for each pair sequence fragment and in all cases F-SVM improves the performances obtained by the corresponding SVM classifiers. The F-SVM performance on the test samples is measured by area under ROC curve (AUC) as 80.16% which is around 1.55% higher than the classical SVM classifier.KeywordsProtein-protein interactionSupport vector machineFuzzy SVM
- Research Article
36
- 10.1155/2015/978193
- Jan 1, 2015
- Biochemistry Research International
Protein functions through interactions with other proteins and biomolecules and these interactions occur on the so-called interface residues of the protein sequences. Identifying interface residues makes us better understand the biological mechanism of protein interaction. Meanwhile, information about the interface residues contributes to the understanding of metabolic, signal transduction networks and indicates directions in drug designing. In recent years, researchers have focused on developing new computational methods for predicting protein interface residues. Here we creatively used a 181-dimension protein sequence feature vector as input to the Naive Bayes Classifier- (NBC-) based method to predict interaction sites in protein-protein complexes interaction. The prediction of interaction sites in protein interactions is regarded as an amino acid residue binary classification problem by applying NBC with protein sequence features. Independent test results suggested that Naive Bayes Classifier-based method with the protein sequence features as input vectors performed well.
- Research Article
3
- 10.1016/0041-0101(89)90028-7
- Jan 1, 1989
- Toxicon
Acute effect of Russell's viper ( Vipera russelli siamensis) venom on renal hemodynamics and autoregulation of blood flow in dogs
- Research Article
- 10.1093/intbio/zyaf020
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf002
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf021
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf012
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf001
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf014
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf005
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf019
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf010
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Research Article
- 10.1093/intbio/zyaf007
- Jan 8, 2025
- Integrative biology : quantitative biosciences from nano to macro
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.