Abstract

Identification of the drug-binding residues on the surface of proteins is a vital step in drug discovery and it is important for understanding protein function. Most previous researches are based on the structural information of proteins, but the structures of most proteins are not available. So in this article, a sequence-based method was proposed by combining the support vector machine (SVM)-based ensemble learning and the improved position specific scoring matrix (PSSM). In order to take the local environment information of a drug-binding site into account, an improved PSSM profile scaled by the sliding window and smoothing window was used to improve the prediction result. In addition, a new SVM-based ensemble learning method was developed to deal with the imbalanced data classification problem that commonly exists in the binding site predictions. When performed on the dataset of 985 drug-binding residues, the method achieved a very promising prediction result with the area under the curve (AUC) of 0.9264. Furthermore, an independent dataset of 349 drug- binding residues was used to evaluate the pre- diction model and the prediction accuracy is 84.68%. These results suggest that our method is effective for predicting the drug-binding sites in proteins. The code and all datasets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Ensem_DBS.zip.

Highlights

  • It’s known that the function of a protein is determined to a great extent by the binding sites on its interacting surface with other molecules

  • A new support vector machine (SVM)-based ensemble learning method was developed to deal with the imbalanced data classification problem

  • The 77 protein chains were extracted from the structures of drugprotein complexes determined by X-ray crystallography with a resolution better than 2.5 Å in protein data bank (PDB) [13]

Read more

Summary

Introduction

It’s known that the function of a protein is determined to a great extent by the binding sites on its interacting surface with other molecules. Identification of these binding sites is crucial for elucidating protein functions and further assisting drug design. The SCREEN method [8] computed 408 physicochemical, structural and geometric features for identifying drug-binding cavities with the coverage of 88.9%. These structure-based approaches can find the concave regions of the binding ligands and receive the high prediction accuracy. It is necessary to develop effective and reliable computational methods to predict the drug-binding sites

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call