Predicting protein-ligand binding residues with deep convolutional neural networks

Yifeng Cui,Xikun Wang,Daocheng Hong,Qiwen Dong

doi:10.1186/s12859-019-2672-1

Yifeng Cui, Xikun Wang + Show 2 more

Open Access

https://doi.org/10.1186/s12859-019-2672-1

Copy DOI

Abstract

BackgroundLigand-binding proteins play key roles in many biological processes. Identification of protein-ligand binding residues is important in understanding the biological functions of proteins. Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based methods. All these methods are based on traditional machine learning. In a series of binding residue prediction tasks, 3D-structure-based methods are widely superior to sequence-based methods. However, due to the great number of proteins with known amino acid sequences, sequence-based methods have considerable room for improvement with the development of deep learning. Therefore, prediction of protein-ligand binding residues with deep learning requires study.ResultsIn this study, we propose a new sequence-based approach called DeepCSeqSite for ab initio protein-ligand binding residue prediction. DeepCSeqSite includes a standard edition and an enhanced edition. The classifier of DeepCSeqSite is based on a deep convolutional neural network. Several convolutional layers are stacked on top of each other to extract hierarchical features. The size of the effective context scope is expanded as the number of convolutional layers increases. The long-distance dependencies between residues can be captured by the large effective context scope, and stacking several layers enables the maximum length of dependencies to be precisely controlled. The extracted features are ultimately combined through one-by-one convolution kernels and softmax to predict whether the residues are binding residues. The state-of-the-art ligand-binding method COACH and some of its submethods are selected as baselines. The methods are tested on a set of 151 nonredundant proteins and three extended test sets. Experiments show that the improvement of the Matthews correlation coefficient (MCC) is no less than 0.05. In addition, a training data augmentation method that slightly improves the performance is discussed in this study.ConclusionsWithout using any templates that include 3D-structure data, DeepCSeqSite significantlyoutperforms existing sequence-based and 3D-structure-based methods, including COACH. Augmentation of the training sets slightly improves the performance. The model, code and datasets are available at https://github.com/yfCuiFaith/DeepCSeqSite.

Highlights

Ligand-binding proteins play key roles in many biological processes
These properties of proteins ensure the feasibility of predicting binding residues from amino acid sequences or 3D structures
The state-ofthe-art ligand-binding method COACH and some of its submethods are selected as baselines

Summary

Introduction

Ligand-binding proteins play key roles in many biological processes. Identification of protein-ligand binding residues is important in understanding the biological functions of proteins. By contrast, owing to the technical difficulties and high cost of experimental determination, the structural details of only small parts of proteins are known in terms of protein-ligand interaction. Both biological and therapeutic studies require accurate computational methods for predicting protein-ligand binding residues [1]. The primary structure of a protein directly determines the tertiary structure, and the binding residues of proteins are closely bound with the tertiary structure These properties of proteins ensure the feasibility of predicting binding residues from amino acid sequences (primary structures) or 3D structures. We have motivation for using machine learning in binding residue prediction, which is based on the unknown complex mappings from structures to binding residues

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 26, 2019
Citations: 61	License type: open-access

R Discovery Prime

R Discovery Prime

Predicting protein-ligand binding residues with deep convolutional neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Structure-based prediction of protein- peptide binding regions using Random Forest.
Ghazaleh Taherzadeh ... Yuedong Yang
Bioinformatics | VOL. 34
Ghazaleh Taherzadeh, et. al.Ghazaleh Taherzadeh ... Yuedong Yang
26 Sep 2017
Bioinformatics | VOL. 34

Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
-
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34
--
12 Apr 2022
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34

Accurate prediction of protein-ATP binding residues using position-specific frequency matrix
Jun Hu ... Gui-Jun Zhang
Analytical Biochemistry | VOL. 626
Jun Hu, et. al.Jun Hu ... Gui-Jun Zhang
07 May 2021
Analytical Biochemistry | VOL. 626

A Gas Classification Algorithm of Electronic Noses Based on Convolutional Spiking Neural Network
Yizhou Xiong ... Yingying Xue
Electrochemical Society Meeting Abstracts | VOL. MA2021-01
Yizhou Xiong, et. al.Yizhou Xiong ... Yingying Xue
30 May 2021
Electrochemical Society Meeting Abstracts | VOL. MA2021-01

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting protein-ligand binding residues with deep convolutional neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics