Abstract

The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was generated by SMOTE to improve classifier performance, and a prediction model was constructed using XGBoost. The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE. An evaluation using 12 different types of ligand binding site independent test sets showed that SXGBsite performs similarly to the existing methods on eight of the independent test sets with a faster computation time. SXGBsite may be applied as a complement to biological experiments.

Highlights

  • Accurate prediction of protein–ligand binding sites is important for understanding protein function and drug design [1,2,3,4]

  • The results of TargetS, SVMPred, NsitePred, and EC-random under-sampling (RUS) are the threshold of maximizing the Matthews correlation coefficient (MCC) value

  • In terms of the ATP, ADP, AMP, GDP, and guanosine triphosphate (GTP) independent test sets, the metrics of the best prediction quality refer to the AUC of TargetS and the MCC of EC-RUS

Read more

Summary

Introduction

Accurate prediction of protein–ligand binding sites is important for understanding protein function and drug design [1,2,3,4]. The experiment-based three-dimensional (3D) structure recognition of protein–ligand complexes and binding sites is relatively expensive and time consuming [5,6]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call