SmoPSI: Analysis and Prediction of Small Molecule Binding Sites Based on Protein Sequence Information.

Wei Wang,Hehe Lv,Junwei Huang,Hongjun Zhang,Keliang Li,Shixun Wang

doi:10.1155/2019/1926156

Wei Wang, Hehe Lv + Show 4 more

Open Access

https://doi.org/10.1155/2019/1926156

Copy DOI

Abstract

The analysis and prediction of small molecule binding sites is very important for drug discovery and drug design. The traditional experimental methods for detecting small molecule binding sites are usually expensive and time consuming, and the tools for single species small molecule research are equally inefficient. In recent years, some algorithms for predicting binding sites of protein-small molecules have been developed based on the geometric and sequence characteristics of proteins. In this paper, we have proposed SmoPSI, a classification model based on the XGBoost algorithm for predicting the binding sites of small molecules, using protein sequence information. The model achieved better results with an AUC of 0.918 and an ACC of 0.913. The experimental results demonstrate that our method achieves high performances and outperforms many existing predictors. In addition, we also analyzed the binding residues and nonbinding residues and finally found the PSSM; hydrophilicity, hydrophobicity, charge, and hydrogen bonding have obviously different effects on the binding-site predictions.

Highlights

Proteins perform the biological functions through interactions with other molecules
Many calculation methods have been proposed for the problem of drug molecule and protein binding sites. ese methods are fast and inexpensive compared to traditional biochemical experiments. e identification methods for binding sites were mainly classified into the following categories. e purely geometric-based approach follows the assumption that the protein-small molecule binding site is usually located in the gap of the protein surface or the pores of the protein
We evaluated our method on 14 molecular datasets of protein-small molecules. e 14 kinds of small molecules are ACO, ADP, atrial natriuretic peptide (ANP), ATP, COA, FAD, FMN, GDP, GNP, NAD, NAP, NDP, SAH, and SAM

Summary

Introduction

Proteins perform the biological functions through interactions with other molecules. In most cellular processes, proteins interact with small molecules to perform their biological functions. erefore, the prediction of proteinsmall molecule binding sites is of great significance for understanding and exploring the function of proteins [1,2,3,4,5,6]. Many calculation methods have been proposed for the problem of drug molecule and protein binding sites. E purely geometric-based approach follows the assumption that the protein-small molecule binding site is usually located in the gap of the protein surface or the pores of the protein. E SITEHOUND algorithm identifies potential ligand-binding sites and regions characterized by favorable nonbonded interactions with a chemical probe [10]. ATPsite combines secondary structure, solvent accessibility, and dihedral angles based on evolutionary information to construct the SVM classifier to predict the ATP binding residue [11]. TargetS extracts three characteristics of evolutionary information, secondary structure, and ligand-specific binding propensity features. Based on these features, an Adaboost classifier scheme is proposed [13]. Dai proposed a solution that uses PSSM features based on sequence data and combines methods based on geometric cavity recognition

Methods

Results

Conclusion