Abstract

MotivationS-adenosyl-L-methionine (SAM) is an essential cofactor present in the biological system and plays a key role in many diseases. There is a need to develop a method for predicting SAM binding sites in a protein for designing drugs against SAM associated disease. To the best of our knowledge, there is no method that can predict the binding site of SAM in a given protein sequence.ResultThis manuscript describes a method SAMbinder, developed for predicting SAM interacting residue in a protein from its primary sequence. All models were trained, tested, and evaluated on 145 SAM binding protein chains where no two chains have more than 40% sequence similarity. Firstly, models were developed using different machine learning techniques on a balanced data set containing 2,188 SAM interacting and an equal number of non-interacting residues. Our random forest based model developed using binary profile feature got maximum Matthews Correlation Coefficient (MCC) 0.42 with area under receiver operating characteristics (AUROC) 0.79 on the validation data set. The performance of our models improved significantly from MCC 0.42 to 0.61, when evolutionary information in the form of the position-specific scoring matrix (PSSM) profile is used as a feature. We also developed models on a realistic data set containing 2,188 SAM interacting and 40,029 non-interacting residues and got maximum MCC 0.61 with AUROC of 0.89. In order to evaluate the performance of our models, we used internal as well as external cross-validation technique.Availability and Implementationhttps://webs.iiitd.edu.in/raghava/sambinder/.

Highlights

  • Structural and functional annotation of a protein is one of the major challenges in the era of genomics

  • It has been shown that the performance of in silico method for protein annotation depends on the quality of protein structure used for its development (Chauhan et al, 2010; Patiyal et al, 2019)

  • The model achieved an accuracy of 70.79%, 0.42 Matthews Correlation Coefficient (MCC), and 0.78 area under receiver operating characteristics (AUROC) on the training data set and accuracy of 70.85%, 0.42 MCC, and 0.79 AUROC on the validation data set

Read more

Summary

Introduction

Structural and functional annotation of a protein is one of the major challenges in the era of genomics. With the rapid advancement in sequencing technologies and concerted genome projects, there is an increasing gap between the sequenced protein and functionally annotated proteins, (Casari et al, 1995; Yu et al, 2014; Agrawal et al, 2019d). There is a requirement of SAM Interacting Residue Prediction automated computational methods that can identify the residues playing an essential role in protein functions. Generalized methods have been developed which predicts the binding site or pockets in the proteins regardless of their ligand (Levitt and Banaszak, 1992; Laskowski, 1995; Hendlich et al, 1997; Dundas et al, 2006; Le Guilloux et al, 2009). Comprehensive information on the software developed for protein–small molecule interaction is reviewed in a paper by Agrawal et al (2018)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call