Abstract

Accurate identification of ligand-binding pockets in a protein is important for structure-based drug design. In recent years, several deep learning models were developed to learn important physical–chemical and spatial information to predict ligand-binding pockets in a protein. However, ranking the native ligand binding pockets from a pool of predicted pockets is still a hard task for computational molecular biologists using a single web-based tool. Hence, we believe, by using closer to real application data set as training and by providing ligand information, an enhanced model to identify accurate pockets can be obtained. In this article, we propose a new deep learning method called DeepBindPoc for identifying and ranking ligand-binding pockets in proteins. The model is built by using information about the binding pocket and associated ligand. We take advantage of the mol2vec tool to represent both the given ligand and pocket as vectors to construct a densely fully connected layer model. During the training, important features for pocket-ligand binding are automatically extracted and high-level information is preserved appropriately. DeepBindPoc demonstrated a strong complementary advantage for the detection of native-like pockets when combined with traditional popular methods, such as fpocket and P2Rank. The proposed method is extensively tested and validated with standard procedures on multiple datasets, including a dataset with G-protein Coupled receptors. The systematic testing and validation of our method suggest that DeepBindPoc is a valuable tool to rank near-native pockets for theoretically modeled protein with unknown experimental active site but have known ligand. The DeepBindPoc model described in this article is available at GitHub (https://github.com/haiping1010/DeepBindPoc) and the webserver is available at (http://cbblab.siat.ac.cn/DeepBindPoc/index.php).

Highlights

  • A protein can interact with binding partners such as small molecules, nucleic acids or with other proteins in the cell to perform its different important biological functions

  • The basic idea of mol2vec is to consider the SMILES string as molecular sentence which are composed of words, and like the natural language processing method word2vec, an unsupervised machine learning method was used to construct the mol2vec by learning vector of each word based on a large amount of available chemical compounds dataset (Krallinger et al, 2015)

  • DeepBindPoc performance on the training, validation and testing datasets To determine the hyperparameter of epoch number, we check the convergences by monitoring the change of accuracy and loss value in both the training and validation process with the increasing epoch number

Read more

Summary

Introduction

A protein can interact with binding partners such as small molecules, nucleic acids or with other proteins in the cell to perform its different important biological functions. Understanding how and where these molecules bind in the protein targets provides valuable information for therapeutic design because it is essential to mimic or enhance a function in the cell (Lionta et al, 2014). Predicting ligand binding pockets in proteins is one of the key issues in the early stages of structure-based drug discovery and still an unresolved problem in computer-aided drug design (Liang, Edelsbrunner & Woodward, 1998; Miller & Dill, 2008). Concavity, and CASTp are hybrid methods which use similarity searches from existing databases and other geometric indices to identify pockets (Capra et al, 2009; Le Guilloux, Schmidtke & Tuffery, 2009; Tian et al, 2018)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call